The folio 




s of code accomplish two tasks: 
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I) Calculation of the topomeric conformation for a particular 
molecule, assuming that the molecule is referenced by a particular 
row of a Tripos Molecular Spreadsheet (MSS). With minor 
adaptations this code could be used in other molecular modeling 
environments, such as Cerius 2, Quanta, or Insight. 

II) Calculation of the line slope assuming that the biological 
data and one or more columns of property data are stored in a Tripos 
Molecular Spreadsheet (MSS). Almost any other software for 
manipulating data in a spreadsheet or other tabular representation 
could be adapted to perform similar calculations, assuming a 
Tanimoto function for expressing "distances" between bitsets of equal 
cardinality. 

Both sections of code include procedures written in two languages. 
The first is C, familiar to all programmers, and includes both all 
specialized structure declarations and also brief explanations of all 
functions used. The second is SPL, an interpretative language 
available within the SYBYL molecular modeling program, whose 
syntax is similar to a Unix shell script. The SPL language is described 
fully in the volume entitled SPL Manual , found within the 
documentation set for SYBYL 6.2, release date July 1995. This volume 
includes descriptions of all "expression generators" (functions 
returning a value) and "macro commands" not specifically explained 
below. 

I. Topomeric Field Code: 

A. SPL macro CHOMIBUILD3D. To build topomerically aligned 
3D models, the third argument must have the value ALIGN, and the 
global associative array element CHOM!Align[ALICYC] must have 
the value All_trans. Code to allow user adjustment of these and 
other 3D model-building parameters appearing in this code as other 
elements of CHOM!Align[] is not shown. 

B. Under these circumstances the following SPL macro 
GHOM!Alltrans sets all torsions provided to their topomeric values. 

C. To determine the atoms defining each torsion to be adjusted, 
CHOMIAlltrans invokes the expression generator % trans_path(), 
which executes the following C subroutine S YB_M GEN_CONN_BEST, 
with its associated subroutines syb_mgen_conn_att_atoms , 
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get_path_mw, get_path_xyz, and (if debugging) ashow. No user- 
adjustable values are used by this code. All non-obvious include files 
and a brief functional description of subroutines external to this code 
are provided in section III below. 

D. The computation of rotatable-bond-attenuated steric (and/or 
electrostatic, hydrogen bonding) fields for the topomerically aligned 
conformation is carried out by the C subroutine 
QSAR_FIELD_EVAL_RB_ATTEN, which uses the accompanying 
subroutine QSAR_FIELD_RB_WTS to generate an attenuated weight 
for each atom's contribution to the field(s), (Pseudo code for the 
latter subroutine appears in its header comment.) The attenuation 
factor (recommended value of 0.85) is a user-adjustable or 
"tailorable" value, here shown as COMFA!AGGREG_SCALING. The 
user-adjustable HBOND_RAD_SCALING parameter affects the steric 
"radius" of a hydrogen-bonding hydrogen. 

II. Patterson-Distribution Validation Code 

A. The SPL expression generator lrt_Jast returns the slope of 
the "best" line along with the count of data points and the 
fractional area, within a "virtual" or conceptual graph of absolute 
differences in biological activities vs absolute differences in the 
diversity measurement to be validated. The format of its output 
appears in the header comment. 

B. The short SPL expression generator dochi shows the 
computation ' of the chi-squared statistic resulting from the output of 
the lrt_fast expression generator. 

C The C code functions QSHELL_HIER_LR T, 
QSHELL_HIER_DO_LRT, and fptjxeapsort generate the results 
produced by lrt_fast. These routines generate the biological 
differences themselves but rely on some external procedure, not 
shown, to generate the distances between the diversity 
measurements. (The reason is that the method of calculating 
differences depends on the diversity parameter(s). Typically a 
Euclidean distance is calculated for scalar properties, or a Tanimoto 
difference is calculated for bitsets, and if multiple parameters are 
combined to form the diversity measurement to be validated then 
the relative weighting must also be specified by the user.) 

Section III. Supporting information for interpretation of the C code in 
Sections I and II. 
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A. Declarations of complex and non-standard data structures 
referenced by the declarations within these C procedures, specifically 
for molecules, atoms, and the regions, fields, and other user input 
information that are part of a CoMFA field description. 

B. Functional descriptions of all external subroutines called by 
these C procedures, ordered alphabetically. 
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# SECTION I -A. Macro BUILD_3D for generating and storing topomeric alignments 



©macro BUILD_3D CHOM 

# builds 3D models, 

# storage in a database or in a conformer column 
# 

ft 
# 

ft 
ft 
# 



either not-aligned (just uses Concord or as- is if from Unity, 

or minimizes input structure) 
or aligned for CoMFA (requires core structure as alignment template) 
with optional fixup of side chains, charge calculation 
is row ids in current MSS 

is storage code {will retrieve structure from same place or somewhere) 
is align (U or A) 
is basic building technique 
other arguments, used only if ALIGN is true, are elements 
of the global associative array CHOM! ALIGN 



$1 
$2 
$3 
$4 



# set up mol retrieval from MSS to be fast and clean 

localvar AFFECT_SUBSET_save 
localvar EXAMINE_TAILOR_MODE_save . 
localvar H IGHL IGHT_MS S_sa ve 
localvar I N FORM_s a ve 
localvar INPUT_MODE_save 
localvar RELATE_save 
localvar SHOW_MOLECULE_save 

localvar USER_FUNCTION_save natmcore heavy ys 
localvar align ma rid cgq_save tailor_bumps_save newc \ 
a b max_save usehs rat yrat nrat noth 



setvar AFFECT_SUBSET_save 
setvar EXAMINEJTAILOR_MODE_save 
setvar HIGHLIGHT_MSS_save 
setvar INFORM_save 
setvar INPUT_MODE_save 
setvar RELATE_save 
setvar SHOW__MOLECULE_save 
setvar USER_FUNCTION_save 
setvar cgq_save $CGQ_TIMEOUT 
set CGQ timeout 0 



$ TAILOR ! EXAMINE ! AFFECT__SUBSET 

$ TAILOR ! EXAMINE ! EXAMI NE_TAI LOR_MODE 

$ TAILOR • EXAMINE ! HIGHLIGHT_MSS 

$ TAILOR ! EXAMINE ! INFORM 

$ TAILOR ! EXAMINE ! INPUT_MODE 

$ TAILOR ! EXAMINE ! RELATE 

$ TAILOR ! EXAMINE ! SHOW_MOLECULE 

$TAILOR ! EXAMINE ! US ER_FUNCTION 



setvar TAILOR! EXAMINE !AFFECT_S UBS ET 
setvar TAILOR ! EXAMINE ! EXAMINE JTAILOR_MODE 
setvar TAILOR ! EXAMINE ! HIGHLIGHT_MSS 
setvar TAILOR ! EXAMINE ! INFORM 
setvar TAILOR ! EXAMINE ! INPUT_MODE 
setvar TAILOR! EXAMINE! RELATE 
setvar TAILOR ! EXAMINE ! SHOW_MOLECULE 
setvar TAILOR! EXAMINE ! US ER_FUNCTION 



NONE 
SILENT 
NO 
NO 

ROW__COLUMN_EXPR 

NO 

YES 

NONE 



setvar max_save $TAILOR ! MAXIMIN2 ! LS_STEP_SIZE $TAILOR 1MAXIMIN2 ! MAX IMUM_ ITERATION 

setvar ma %table_attribute ( MOL_AREA ) 

# if needed make new place to put output 
setvar newc 

switch %substr( $213) 
case NEW) 

setvar newc %math( %table( * COL COUNT ) + 1 ) 
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table column sin %cat ( CONF $newc ) 
case SYB) 

database open %qspr_table_db{ %table_default () ) update 
table ATTRIBUTE SET CON FORMER 0 

case ) 

setvar newc %substr{ $2 1 %math( %pos ( _ $2 ) - 1 ) 
TABLE CON FORMER $newc 

endswitch 

if %streql( %substr( $311) "A" ) 

# are we bump checking ? 
if $CHOM! Align [BUMPS] 

setvar tailor_bumps_save $TAI LOR! GENERAL !bumps_contact_di stance 
tailor set general bumps_contact_distance %math( $CHOM! Align [BUMPS] - 1.0 ) 
endif 

## 

# STEP 1 : prepare template fragment 
## 

setvar mcore $CHOM!Align[ MCORE ] 

# save original template 
setvar mcsav %molempty ( ) 
copy $mcore $mcsav 

default $mcore >$nulldev 
if $CHOM! Align [DEBUG] 
label id * 
endif 

setvar capsln %cat { %sln( $mcore ) ) 
setvar natcore %mol_info( $mcore NATOMS ) 

# IF the alignment template has just one free valence, 

# make geometrically acceptable template by adding heavy atoms, minimizing 

# else use as is 

setvar heavy TRUE 

fillvalence *-H* Hal >$nulldev 

if %gt( %math( %mol_inf o ( $mcore NATOMS ) - $natcore ) 1 ) 
copy $mcsav $mcore 
setvar heavy 

endif 
if $ heavy 
for a in %atoms (<H*>- <H>) 

modify atom type $a C.3 >$nulldev 
modify atom name $a XI >$nulldev 
endf or 
endif 

TAILOR SET MAXIMIN2 LS_STEP_SIZE 0.0001 MAXIMUM_ITERATIONS 1000 | | 
MAXIMIN $mcore DONE INTERACTIVE >$nulldev 

if $heavy 

for a in %atoms(Xl) 

modify atom type $a HEV >$nulldev 

# must rename it ! ! 

modify atom name $a XI >$nulldev 
endf or 

setvar ys %set_create{ %atoms(Xl) ) 

# orient template so that an R points in the positive X direction 
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setvar rat %arg( 1 %set_unpack( $ys ) ) 

setvar nrat %arg{ 1 %atom_info{ $rat NEIGHBORS ) ) 

setvar yrat %arg( l %set_unpack( %set_diff ( \ 

%set_create( %atom_info{ $nrat NEIGHBORS ) ) $rat ) ) ) 
ORIENT USER $nrat $rat $yrat >$nulldev 
endif 

# identify all the non-primary atoms for FIT, in/out of the search pattern 

# and all the basic torsions (bonds to Ys) that potentially need setting 

setvar tpat %arg( 1 %search2d( %cat ( %sln( $mcore ) ) $capsln NoDup 0 y ) ) 
setvar hvinpat 
setvar patats 
setvar tors 
setvar usehs 

setvar sybhvats %set_create (%atoms (*-<H>) ) 
if %lt( %set_size( $sybhvats ) 3 ) 
setvar usehs TRUE 

setvar sybhvats %set_create (%atoms {*) ) 

endif 

for a in %range(l %sln_atom_count ( $capsln ) ) 

if %or( "Susehs" "%not{ %set_and( %sln_atom symbol ( $capsln $a ) \ 
H,F,Cl,Br,I ) ) » ) 

# for FIT, need to know the SYBYL IDs of the heavy atoms 

setvar hvinpat $hvinpat $a 

setvar patats [ $a ] %sln_rgroup_sybid ( $mcore $tpat $a ) 
setvar patats [ $a ] [ YS ] %set_and{ "$ys" "%set_create ( \ 
%atom_info( $patats [ $a ] NEIGHBORS ) ) " ) 

# for each torsion root, need to save the SLN ID of an arbitrary 

# heavy atom torsional definer 
if $patats [ $a ] [ YS ] 

setvar tors [ $a ] %set_and( %set_diff{ "%set_create ( \ 
%atom_info( $patats [ $a ] NEIGHBORS ) ) » $patats [ $a ] [ YS ] ) $sybhvats ) 

# if there are several possibilities, prefer the lowest #'d carbon 

# to define trans-ness 

if %gt( %set_size{ $tors t $a ] ) 1 ) 

if %set_and( $tors [ $a ] %set_create( %atoms(<C*>) ) ) 
setvar tors [ $a ] %set_and( $tors [ $a ] \ 
%set_create( %atoms(<C*>) ) ) 

endif 

setvar tors [ $a ] %arg( l %set_unpack( $tors[ $a ] ) ) 

endif 

for al in %range(l %sln_atom_count ( $capsln ) ) 

if %eq( $tors[ $a ] %sln_rgroup_sybid ( $mcore $tpat $al ) ) 
setvar tors[$a] $al 
break 
endif 
endf or 
endif 
endif 
endf or 
if $CHOM! Align [DEBUG] 

echo %prompt ( INT 1 " " » " ) 
endif 
endif 

default $ma >$nulldev 
setvar CHOMIBadRows 
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# build 3D models 
## 

# off we go ! ! Get MSS row IDS to build models for 
if %streql( $1 * ) 

setvar rids %table( + ROW NUM ) 
else 

setvar rids %set__unpack ( $1 ) 
endif 

for rid in $rids 

# get the next MSS entry to be modelled 
table examine $rid | | >$nulldev 

# fix N02's (egad what a pain) because Concord & SYBYL are inconsistent 
setvar pat %search2d( %sln{ $ma ) N(=0)0 ALL 0 y ) 

while Spat 

setvar pat %sln_rgroup_sybid ( $ma %arg ( 1 $pat ) 1 3 ) 
modify bond type %bonds ( %cat ( %arg( 1 $pat ) " = " \ 

%arg( 2 $pat ) ) ) 2 >$nulldev 
modify atom type %arg{ 2 $pat ) o.2 
setvar pat %search2d( %sln{ $ma ) N(=0)0 ALL 0 y ) 
endwhile 

if $CHOM! Align [DEBUG] 

label id * 
endif 

# basic optimization 
switch $4 

case CONCORD) 

CONCORD MOL $ma >$nulldev 

# if Concord failed, we may still be awfully flat 

# minimize if there are heavy atoms not part of a single aromatic system . . 

setvar noth %atoms ( *-<H> ) 

setvar al %arg { 1 $noth ) 

if %set_diff( "%set_create( $noth )" \ 

"%set_create ( %atoms ( %cat ( M {aromatic ( 0 "$al" " ) } " ) ) ) " ) 
setvar zs %extent_3d( %cat ( $ma " (*) " ) 
setvar zs %math( %arg ( 5 $zs ) - %arg ( 6 $zs ) ) 
if %eq{ $zs 0.0 ) 

%unflatten( %cat { $ma "(*)" ) ) 
MAXIMIN $ma DONE INTERACTIVE 
endif 
endif 

case MINIMIZE) 

MAXIMIN $ma DONE INTERACTIVE >$nulldev 

endswitch 

# done, if only 3d coord, but for topomeric CoMFA .. 
if %streql( %substr( $311) "A" ) 

# find any arbitrary 2D hit 

setvar pat %search2d( %cat ( %sln( $ma ) ) $capsln NoDup 0 y ) 

if %not( $pat ) 

setvar CHOMIBadRows %set_or( u $CHOM!BadRows f ' $rid ) 

echo $capsln not found in molecule for Row $rid . . skipping 

goto nextl 



endif 

setvar pat %arg(l $pat ) 

setvar allpatats %set_create{ %sln_rgroup_sybid< $ma $pat \ 
%range{ l %sln_atom_count { $capsln ) ) ) ) 

# collect all appropriate heavy atoms for FIT and torsions 

setvar matl 
setvar mat2 
setvar schns 
for a in $hvinpat 

setvar matl $matl $patats [ $a ] 

setvar sybat %sln_rgroup_sybid ( $ma $pat $a ) 

setvar mat2 $mat2 $sybat 

# are there heavy atom neighbors to FIT also (and generate torsion lists)? 

if $patats[$a] [YS] 

setvar ans %set_diff ( %set_create( \ 

%atom_info( $sybat NEIGHBORS ) ) $allpatats ) 
setvar ans %atoms ($ans-<H>) 
setvar i 1 

for p in %set_unpack( $patats[$a] [YS] ) 

# add heavy atom neighbors to FIT list 

if %arg{ $i Sans ) 
setvar matl $matl $p 
setvar mat2 $mat2 %arg( $i $ans ) 

# generate another torsion for CHOMIalltrans 

setvar schns $schns %cat { $sybat " , " \ 
%sln_rgroup_sybid( $ma $pat $tors [ $a ] ) " , " %arg( $i $ans ) ) 

endif 

setvar i %math ( $i + 1 ) 
endf or 
endif 
endf or 

setvar dofit MATCH %cat ( $mcore "(" %set_create ( $matl ) ")" ) \ 

%cat( $ma n ( " %set_create( $mat2 ) ) 
$dofit >$nulldev 
if $CHOM! Align [DEBUG] 

echo %prompt( INT 1 " " " " ) 
endif 



# do FIT 

if %gt( $MATCH_RMS $CHOM!Align[ FITRMS ] ) 

setvar CHOMJBadRows %set_or( "$CHOM!BadRows M $rid ) 

echo Bad geometric alignment (MATCH_RMS = $MATCH_RMS) for Row $rid 

goto nextl 

endif 

# side chain alignments . . 

switch $CHOM!Align[ ALICYC ] 
case User_Macro) 

$CHOM! Align[ ALIDATA ] $ma $CHOM! ALIGN [ MCORE ] 

case All_trans) 
case With_Templates) 

setvar no j rings TRUE 

setvar rbds %set_create( %bonds ({ rings ()} ) ) 
for i in $schns 

setvar jbds %set_unpack{ $i ) 

# can set "side chain" bonds only if connecting bond is not cyclic 

if %set_and( "$rbds n "%bonds( %cat ( %arg ( 3 $jbds ) = \ 
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%arg( 1 $jbds ) ) ) " ) 
setvar no j rings 

else 

CHOMiAllTrans $jbds 

endif 
endf or 
if $CHOM! Align [DEBUG] 

echo %prompt ( INT 1 " n n " ) 
endif 

if %streql{ $CHOM!Align[ ALICYC ] With_Templates } 
setvar f %open( $CHOM!Align[ ALIDATA ] "r» ) 
setvar buff %read( $f ) 
setvar slnma treat ( %sln( $ma ) ) 
while $buff 

# each line of text should have pattern, SLN IDs for the 4 torsion atoms, 

# and a torsion value to set 

if %eq( %count( $buff ) 5 ) 

setvar torpat %search2d( $slnma %arg( 1 $buff ) NoDup 0 y ) 
for t in $torpat 

MODIFY TORSION %sln_rgroup_sybid ( $ma $t %arg { 2 $buff ) \ 
%arg( 3 $buff ) %arg( 4 $buff ) ) %arg ( 5 $buff ) >$nulldev 

endfor 
endif 
endwhile 
%close( $f ) 
endif 

endswitch 
endif 

# do a bump check? 

if $CHOM! Align [BUMPS] 

if %atoms ({bumps (*,*) }) 

setvar CHOMIBadRows %set_or( "$CHOM! BadRows" $rid ) 

echo Bad steric contacts in aligned conformer for Row $rid . . skipping 
goto nextl 
endif 
endif 

# partial charges . . 

switch $CHOM!Align[ CHARGE ] 
case None) 

case User_Macro) 

exec $CHOM!Align[ CHARGEDATA ] $ma 

case ) 

CHARGE $ma COMPUTE $CHOM! Align [ CHARGE ] | >$nulldev 
endswitch 

# put conformer away 
switch %substr( $213) 

case SYB) 

database add $ma r >$nulldev 

case ) 

%wcell{ $rid $newc %cat ( %cat { %sln( $ma FULL CHARGE ) ) ) ) >$nulldev 
endswitch 
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echo Built row $rid 
nextl : 
endf or 

if %streql< %substr( $311) "A" ) 

copy $mcsav $mcore 

zap $mcsav 
endif 

if $CHOM! Align [BUMPS] 

TAILOR SET GENERAL bumps_contact_di stance $tailor_bumps_save | | 
endif 

# done, restore initial EXAMINE settings 
set CGQ_TIMEOUT $cgq_save 

setvar TAILOR ! EXAMINE ! AFFECT_SUBSET $AFFECT_SUBSET_save 
setvar TAILOR ! EXAMINE ! EXAMINE_TAILOR__MODE $EXAMINE_TAILOR_MODE save 

setvar TAILOR! EXAMINE ! HIGHLIGHT_MSS $HIGHLIGHT_MSS save ~ 

setvar TAILOR! EXAMINE! INFORM $INFORM_save *~ 

setvar TAILOR ! EXAMINE ! INPUT_MODE $ INPUT_MODE_save 

setvar TAILOR ! EXAMINE ! RELATE $ RE LATE_s a ve 

setvar TAILOR! EXAMINE ! SHOW_MOLECULE $SHOW_MOLECULE save 

setvar TAILOR ! EXAMINE !USER_FUNCTION $USER_FUNCTION save 

TAILOR SET MAXIMIN2 LS_STEP_SIZE %arg ( 1 $max_save ) \ 
MAXIMUM_ITERATIONS %arg ( 2 $max_save ) | | 

# update row and column information 
if %streql( %substr( $213) NEW ) 

# make any new conformer column become the source of molecules 

TABLE CONF %table { * COL COUNT ) 
CHOM ! UPDATE_ROW_SEL $CHOM ! CID_Las t 
setvar CHOM! CID_Last %math ( $CHOM!CID_Last + 1 ) 
else 

CHOM ! UPDATE_ROW_SEL 
endif 



# Section I-B. Generates the topomeric conformation of the 3D model 
# 

tt===== = = = === ====s==========s========================== ========= = === 

©macro ALLTRANS chom 

# assumes default molecule, takes argument atoms $1 and $2 

# where $1 is the JOINed atom of the core, $2 is the atom that 

# the rest of the substituent is to be trans to, 

# and $3 is the JOINed atom of the substituent 

# starts from that atom and sets all side chains 

# to a topomeric conformation 



localvar bds b bdset al a2 tmp sbonds sats rbond pbds torsion ringbonds doit 

# check input for legality 

setvar tmp %set_create( %atom_info{ $1 NEIGHBORS ) ) 
if %not( %eq( 2 %count ( %set_unpack( %set_and( \ 
"$tmp" %cat ( $2 ", « $3 ) ) ) ) ) ) 
echo Bad input to ALLTRANS (atoms $2 $3 not bonded to $1) 
return 
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to 



endif 

# save key bonds 

setvar rbond %bonds ( %cat ( $3 n =" $1 ) ) 
setvar sats %conn_atoms( $3 $1 } 
if %not( $sats ) 

# echo No substituent atoms found in ALLTRANS 
return 

endif 

setvar sats $3 $sats 

setvar sbonds %set_ create ( %bonds ( \ 

%cat( n {TO_ATOMS( n %set_create ($sats) ")} n )) ) 

# define the other bonds that might need adjusting 

setvar bds %set_create ( %bonds ( ( * - { RINGS ( ) } ) &< l > ) ) 
setvar bds %set_and( "$sbonds u "$bds n ) 
if %not( $bds ) 

return 
endif 

# discard bonds to primary atoms 

setvar mval %set_create{ %atoms{ \ 

<H>+<o. 2>+<F>+<I>+<Cl>+<Br>+<n. !>+<LP>+<Du> ) ) 
setvar pds %set_create( %bonds { %cat ( " { TO_ATOMS ( " $mval ")}" ) ) ) 
setvar bds %set_diff ( $bds $pds ) 
setvar ringbonds %set_create ( %bonds ( { RINGS { ) } ) ) 

# walk all the important bonds 
for b in %set_unpack( $bds ) 

setvar doit TRUE 

# if this is the JOIN bond, already have some info 

if %eq{ $b $rbond ) 
setvar aO $2 
setvar al $1 
setvar a2 $3 

# still need to be SURE we're not monovalent 

if %or{ "%eq( 1 %count ( %atom_info( $al NEIGHBORS ) ) ) " \ 
"%eq( 1 %count( %atom_info( $a2 NEIGHBORS ) ) )" ) 
setvar doit 
endif 
else 

setvar bdat %bond_info( $b ORIGIN TARGET ) 
setvar al %arg( l $bdat ) 
setvar a2 %arg( 2 $bdat ) 

if %or{ "%eq( 1 %count ( %atom_info( $al NEIGHBORS ) ) )" \ 
"%eq< 1 %count( %atom_info( $a2 NEIGHBORS } ) ) »■ ) 
setvar doit 
endif 
if $doit 

# which end leads to root atom? if necessary flip al,a2 to make that one be al 

if %set_and( "%set_create ( %conn_atoms ( $a2 $al ) ) " $1 ) 

setvar tmp $al 

setvar al $a2 

setvar a2 $tmp 
endif 

setvar aO %trans_path( $al $a2 $1 ) 
endif 
endif 
if $doit 

setvar a3 %trans_path( $a2 $al ) 
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switch %count( %set_unpack( °%set_and( °$ringbonds n \ 

%set_create( %bonds ( %cat < $aO » = " $al n , " $a2 " = " $a3 ) ) ) ) n ) ) 

case 0) 

setvar torsion 180 

case 1) 

setvar torsion 90 

case 2) 

setvar torsion 60 
endswitch 

modify torsion $a0 $al $a2 $a3 $torsion >$nulldev 
endif 
endf or 



/* Beginning of section I-C, C code implementing the trans__path expression gener 
/*E + : SYB_MGEN_CONN_BEST*/ 

/*************************************************************^ 

* int SYB_MGEN_CONN_BEST ( identifier, nargs, args, writer ) * 

* Dick Cramer, Apr. 9, 1995 (written for SELECTOR use) * 

* * 

* Expression generator that returns the atoms attached to a given * 

* atom, excepting the second, in a prioritized order. * 

* If there are two arguments, the ordering is by decreasing branch * 

* "size", where "size" is first any path with rings encountered, then 

* number of attached atoms, then MW (paths in cycles end when an atom 

* in another path is encountered.) 

* If three arguments, the atom that is returned is the one that 

* begins the shortest path containing the atom referred to by the 
+ third argument. If multiple such paths, ordering is same as for 

* two arguments . 

* Further prioritization of paths is by molecular weight, 

* and then by lowest X, Y, 2- values. 

+ If last argument is DEBUG, all paths are written to stdout. 
* 

* User interface: 

* %trans_path( al a2 { a3 ) (DEBUG) ) 
************************************** 

int SYB_MGEN_CONN_BEST( identifier, nargs, args, Writer ) 

/* following arguments contain the text supplied to the %trans_path ( ) 

expression generator, and provide an avenue for producing text output. */ 
char *identifier; 
int nargs; 
char *args [] ; 
PFI Writer; 
{ 

* define MAX_NP 8 

struct pathrec { 

int root, nrings, chosen, nats; 
float mw, xyz[3]; 
setjptr path; 



struct pathrec p[MAX_NP]; 

A-12 



int retval, i, np, toroot, al, a2, a4, a, pnow, pdone, growing, 
final_pos, area_num, new_rings, nats, nuats, elem, ncycles, 
best, debug, ringclosed; 
List_Ptr atom_exp_list=NIL, SYB_EXPR_ANALYZE { ) ; 
mol_ptr ml, m2, SYB_AREA_GET_MOLECULE ( ) ; 

atomjptr arec, SYB_ATOM_FIND_REC { ) ; 
/* A setjptr data structure is a Boolean set, first word containing 
its cardinality. */ 

set_j)tr atom_setl=NIL, a2chk = NIL, nuls = NIL, cnats = NIL, 
nxcn = NIL, end_atoms = NIL, scratch = NIL, 
S YB_ATOM_F I ND_S ET ( ) , UTL_S ET_CREATE ( ) ; 
char tempString [256] ; 

float get_path_mw() , diff; 

void get_path_xy z ( ) ; 

retval = 0; 

/* Check the number of arguments */ 
if ( nargs < 2 | | nargs > 4 ) { 

UIMS2_WRITE_ERR0R { 

"Error: %trans_path requires 2 to 4 arguments\n" ); 

return 0; 

} 

np = 0; 

debug = ( !UTL_STR_CMP_NOCASE ( arqs [ nargs - 1], "DEBUG" )); 
toroot = (debug && nargs ==4) |] (! debug && nargs == 3); 

/* PARSE THE INPUT */ 

/* get first atom */ 

if ( ! (atom_exp_list = SYB_EXPR_ANALYZE ( SYB_EXPRJ5ETJVT0MJT0KEN, args[0], 
&final_pos, &area_num ) ) ) 
goto error; 

if (I (ml = SYB_AREA_GET_MOLECULE (area_num) ) ) 
goto cleanup; 

if (i(atom_setl = S YB_ATOM_F I ND_S ET ( ml, atom__exp_list ) ) ) 

goto error; 
if( atom_exp_list) 

SYB_EXPR_DELETE_RPN_LIST { atom_exp_list ) ; 
atom_exp_list = (List_Ptr) NIL; 

if ( ! (1 == UTL_SET_CARDINALITY(atom_setl) ) ) { 
UIMS2_WRITE_ERROR ( 

"Error: First argument must be only one atom\n"); 
goto error; 

} 

if (!(arec = SYB_ATOM_FIND_REC' (ml, UTL_SET_NEXT (atom setl, -1)) )) goto er 
al = arec->recno; ~~ 
UTL_SET_DESTROY( atom_setl ); 
atom_setl = NIL; 
/* get 2nd atom */ 

if ( ! (atom_exp_list = SYB_EXPR_ANALYZE { SYB_EXPR_GET ATOM TOKEN, arqs [11 
&final_pos, &area_num ).) ) ~ ~ 

goto error; 

if (!(m2 = SYB_AREA_GET_MOLECULE (area_num) ) ) 
goto cleanup; 

if ( ! (end_atoms = S YB_AT0M_F I ND_S ET ( m2, atom_exp list))) 
goto error; 
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if( atom_exp_list) 

S YB_EXPR_DELETE_RPN_LIST ( atom_exp_list ) ; 

atom_exp_list = (List_Ptr) NIL; 

if (ml != tn2 ) { 

UIMS2_WRITE_ERR0R ( 

"Error: atoms must be in the same molecule\n" ) ; 
goto error; 

if(!(l = = UTL_SET_CARDINALITY(end_atoms) ) ) { 
UIMS2_WRITE_ERR0R ( 

"Error: Second argument must be only one atom\n"); 
goto error; 

if (!(arec = S YB_ATOM_F I ND_R E C (ml, UTL_S ET_NEXT (end_atoms, -1)) )) goto er 
a2 = arec->recno; 

/ + get 3rd atom */ 
if (toroot) { 

if ( ! (atom_exp_list = SYB_EXPR_ANALYZE ( SYB_EXPR_GET_ATOM_TOKEN, args[2], 
&final_pos, &area_num ))) 
goto error; 

if (!(m2 = SYB_AREA_GET_MOLECULE (area_num) ) ) 
goto cleanup; 

if (!{atom_setl = SYB_ATOM_FIND_SET ( m2 , atom_exp_list) ) ) 

goto error; 
if( atom_exp_list) 

SYB_EXPR_DELETE_RPN_LIST ( atom_exp_list ) ; 

atom_exp_list = (List_Ptr) NIL; 

if (ml != m2 ) { 

UIMS2_WRITE_ERR0R ( 

"Error: atoms must be in the same molecule\n" ) ; 
goto error; 

if(!(l UTL_SET_CARDINALITY(atom_setl) ) ) { 
UIMS2_WRITE_ERR0R ( 

"Error: Second argument must be only one atom\n"); 
^ goto error; 

if (!(arec = SYB_ATOM_FIND_REC (ml, UTL_S ET_NEXT (atom_setl, -1)) )) goto er 
a4 = arec->recno; 



UTL_SET_DESTROY( atom_setl ); 
atom setl = NIL; 

} 

/* GENERATE the paths */ 

/* set up paths */ 

if (!(a2chk = UTL_SET_CREATE ( ml - >max_atoms + 1 ) )) goto error; 
if (!(nuls = UTL_SET_CREATE{ ml - >max_atoms + 1 ) )) goto error; 
if (!(cnats = UTL_SET_CREATE ( ml - >max_atoms + 1 ) )) goto error,' 
if ( ! (nxcn = UTL_SET_CREATE ( ml- >max_atoms + 1 ) )) goto error-' 
if (! (scratch = UTL_SET_CREATE ( ml - >max_atoms + 1 ) )) goto error; 

if ( !syb_mgen__conn_att_atoms ( a2chk, ml, al )) goto error; 
if ( !UTL_SET_MEMBER( a2chk, a2 ) ) { 
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UIMS2_WRITE_ERR0R ( 

"Error: second argument atom is not bonded to first argument atom/\n n ) 
goto error; 

} 

UTIi_SET_DELETE ( a2chk, a2 ); 
a - -1; 
np = 0; 

while (np < MAX_NP (a = UTL_SET_NEXT ( a2chk, a)) >= 0 ) { 

if ( ! (p [np] .path = UTL_SET_CREATE ( ml - >max_atoms + 1 ) )) goto error; 
p [np] . root = a; 
p [np] .nrings = 0; 

UTL_SET_INSERT( p[np].path, a ); 
^ np++; 

/* grow the paths */ 
growing = TRUE; 
nats = 0; 
ncycles = 0; 
while (growing ) { 

nuats - 0; 

ringclosed = FALSE; 

for (pnow = 0; pnow < np; pnow++ ) { 

UTL_SET_COPY_INPLACE ( cnats, p [pnow] .path ); 
UTL_SET_CLEAR ( nxcn ) ; 
elem = - 1 ; 

/* accumnulate this generation of attached atoms into nxcn */ 
while ( (elem = UTL_SET_NEXT { cnats, elem)) >= 0 ) { 
UTL_SET_CLEAR( nuls ); 

if ( !syb_mgen_conn_att_atoms ( nuls, ml, elem )) return ( FALSE ); 
UTL_SET_DELETE( nuls, al ); 

UTL_SET_DIFF_INPLACE( nuls, end_atoms, nuls ); 

UTL_SET_OR_INPLACE { nxcn, nuls, nxcn ); 
UTL_SET_DIFF_INPLACE( nxcn, p [pnow] . path, nxcn ); 

UTL_SET_OR_INPLACE ( p [pnow] . path, nxcn, p [pnow]. path ); 
/■* remove and mark ring closures when growing out */ 

if (Itoroot) for (pdone « 0; pdone < np; pdone++ ) if (pdone != pnow) { 
UTL_SET_AND_INPLACE ( p [pnow] . path, p [pdone] . path, a2chk ); 
if ((new_rings = UTL_SET_CARDINALITY ( a2chk ))) { 
/* we have ring closure (s) */ 

p [pnow] .nrings += new_rings; 
p [pdone] . nrings += new_rings; 
ringclosed = TRUE; 

UTL_SET_OR_INPLACE ( end_atoms, a2chk, end_atoms ); 
/* if pdone < pnow, two branches are now same lengths, drop common atom from bot 
but if >, branches are different, and must avoid repeated closing */ 
if (pdone < pnow) { 

/* remove atom(s) in the previous branch because paths are really same length 
UTL_SET_DIFF_INPLACE ( p [pdone] . path, a2chk, p [pdone] . path ); 
^ UTL_SET_DIFF_INPLACE ( p [pnow] . path, a2chk, p [pnow] .path ); 

else { 

/* must identify and mark each atom in nxcn that is attached to a2chk atom */ 

elem = - 1 ; 

while ( (elem = UTL_SET_NEXT ( a2chk, elem)) >= 0 ) { 
UTL_SET_CLEAR ( scratch ); 

if ( !syb_mgen_conn_att_atoms ( scratch, ml, elem )) 

return ( FALSE ) ; 
UTL_SET_AND_INPLACE ( scratch, nxcn, scratch ); 
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/ 



UTL SET OR INPLACE< end atoms, scratch, end_atoms }; 



/* done growing paths if no more atoms added to any path . . */ 
for {pdone = 0 # nuats = 0; pdone < np; pdone-*-+ ) 

nuats += UTLJSET_CARDINALITY ( p [pdone] . path ); 
if (nuats<=nats && iringclosed) growing = FALSE; 
nats = nuats; 

/* . . or looking for the 4th atom and found it . . */ 

if (toroot) for (pdone = 0; pdone < np; pdone++ ) 

if (UTL_SET_MEMBER< p [pdone] . path, a4 )) growing = FALSE; 
/* . . or after 100 atom layers out regardless */ 

ncycles++; 

if (ncycles >= 100) growing = FALSE; 

} 

/* debugging */ 

if (debug) for (pdone = 0; pdone < np; pdone++) { 

sprintf{ tempString, "Path %d (%d rings, from %d) : n , 

pdone+1, p [pdone] .nrings, p [pdone] .root ) ; 
UBS_OUTPUT_MESSAGE ( stdout, tempString ); 
ashow( p [pdone] .path, ml ); 

} 

/* compute the path properties */ 

for (pdone = 0; pdone < np; pdone++) { 
/* mark as already chosen any path that can't be an answer */ 

p [pdone] . chosen = toroot && !UTL_SET_MEMBER (p [pdone] .path, a4) ; 
p [pdone] .nats = UTL_SET_CARDINALITY ( p [pdone] . path ); 
p [pdone] .nrings = p [pdone] . nrings ? 1 : 0; 
p [pdone] .mw = 0.0; 

p [pdone] .xyz [0] = p [pdone] . xyz [1] = p [pdone] . xyz [2] = 0.0; 

} 

/* return the best result */ 
best = 0; 

for (pdone = 1; pdone < np; pdone++) { 
if (toroot) { 

if (p [best] . chosen !p [pdone] , chosen) best = pdone; 
/* looking backward along chain, always grow away from more negative coord value 
if ( !p [best] . chosen && ! p [pdone] . chosen) { 

get jpath_xyz ( p [pdone] .root, ml, p [pdone] .xyz ) ; 
get_path_xyz ( p [best] . root , ml, p[best].xyz ); 
for ( i = 0; i < 3; i++ ) { 

diff = p [pdone] .xyz [i] - p [best] .xyz [i] ; 
if (diff < -0.1) { 
best = pdone; 
break; 

} 

if (diff > 0.1 ) break; 
/* checking other coords if basically tied at this coord */ 



else { 

if (p[pdone] .nrings && !p[best] .nrings) best = pdone; 
else if (p [pdone] .nats > p [best] .nats) best = pdone; 
else if (p [pdone] .nats == p[best] .nats) { 

p [pdone] .mw « get_path_mw( p [pdone] .path, ml, p[pdone].mw ); 

p[best] .mw = get_path_mw ( p [best] .path, ml, p[best].mw ); 
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if (p[pdone] .mw > pfbest] .mw) best = pdone; 



arec = SYB_ATOM_FIND_REC { ml, p [best]. root ); 

sprint f (tempString, °%d" , arec- >id ) ; 
if ( ! (*Writer) (tempString) ) goto error; 

retval = TRUE ; 
error: 
cleanup: 

if( atom_exp_list) 

SYB_EXPR_DELETE_RPN_LIST( atom_exp_list ) ; 
if (atom_setl) 

UTL_SET_DESTROY (atom_setl) ; 
if (end_atoms) 

UTL_SET_DESTROY(end_atoms) ; 
if (a2chk) 

UTL_SET_DESTR0Y(a2chk) ; 
if (nuls) 

UTL_SET_DESTROY(nuls) ; 
if (nxcn) 

UTL_SET_DESTROY (nxcn) ; 
if (cnats) 

UTL_SET_DESTROY (cnats) ; 
if (scratch) 

UTL_SET_DESTROY (scratch) ; 

return ( retval ) ; 

} 

static int syb_mgen__conn_att_atoms ( aset, m, atid ) 

/* ors atoms attached to atm into aset */ 

/* WORKS STRUCTLY WITH RECNOS */ 

set_ptr aset; 

moljptr m; 

int atid; 

{ 

atomjtr at, SYB_ATOM_FIND_ID ( ) ; 
List_Ptr tohs, UTL_LIST_RETRIEVE_P ( ) ; 
atom_ptr toh, SYB_ATOM_FIND_REC ( ) ; 
acon_ptr connl; 
int nbytesl; 

at = SYB_ATOM_FIND_REC ( m, atid ); 

tohs = at- >conn_atom; 

while (tohs) { 

tohs = UTL_LIST_RETRIEVE_P( tohs, fcconnl, fcnbytesl) 
toh = SYB_ATOM_FIND_REC ( m, connl - >target ); 
UTL SET INSERT ( aset, toh->recno ); 

} " " 

return ( TRUE ) ; 

} 

static float get_path_mw( aset, m, mw ) 

/* returns the total atomic weight of all atoms in aset */ 
setjptr aset; 
mol_ptr re- 
float mw; 
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{ 

int elem = -1; 
float ans = 0.0; 

atomjptr at, SYB_ATOM_FIND_REC ( ) ; 
f pt S Y B_ATAB_ATOMI C_WE IGHT ( ) ; 

if (mw) return ( mw ); 
elem = - 1 ; 

while ( (elem = UTL_SET_NEXT( aset, elem)) >= 0 ) { 
at = SYB__ATOM_FIND_REC ( m, elem ); 

ans += (float) SYB_ATAB_ATOMIC_WEIGHT ( at->type ); 
return ( ans ) ; 

} 

static void get_path_xyz ( aid, m, mw ) 

/* returns the xyz of the supplied atom */ 

int aid; 

mol_ptr m; 

float mw [3] ; 

{ 

int i; 

atom_ptr at, SYB_ATOM_FIND_REC ( ) ; 

if (mw [0] ) return; 

at = SYB_ATOM_FIND_REC ( m, aid ); 

for (i = 0; i < 3; i++) mw[i] = at->xyz[i]; 

return; 

} 



static int ashow( aset, m ) 

/* for interactive debugging, shows a set's membership in terms of atom ID */ 
set_ptr aset; 
mol ptr m; 
{ 

char buff [1000] , *b; 

atom_ptr at , SYB_ATOM_FIND_REC ( ) ; 

int elem; 

*buf f = ' /0' ; 
b = buff; 
elem = - 1 ; 

while ( (elem = UTL_SET_NEXT ( aset, elem)) >= 0 ) { 
at = SYB_ATOM_FIND_REC ( m, elem ); 
sprintff b, " %d" , at->id ); 
b = buff + strlen( buff ); 

sprintf ( b, n \n n ) ; 

UBS_OUTPUT_MESSAGE ( stdout, buff ); 

/* BEGINNING OF SUBROUTINES I-D. Calculation of attenuated fields */ 
/* + E : Q S AR_ F I E LD_ E V AL_R B_ ATT E N ( ) */ 

/* ' */ 

/* int QSAR_FIEL.D_EVAL_RB_ATTEN< molp, stfldp, elfldp, regp, no st, no el, ctp ) 

/* */ 

/* Dick Cramer May 13, 1995 */ 

/* */ 
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/* 

"Standard CoMFA" except that the contribution of any atom 
to the field falls off with an inverse power of its distance 
from a root atom, measured in NUMBER OF ROTATABLE BONDS! 

This means also that each individual atom's contribution 
has a similarly scaled upper bound, rather than checking 
the upper bound only for the sum over all atoms. 

*/ 

/* This procedure computes vdW 6-12 steric values at each point in region */ 



/* and the electrostatic interactions (initially assuming 1/r dielectric) . */ 

/* */ 

/* NOTE:: initially ignoring space averaging, other user knobs.' */ 

/* note:: assuming valid input here; error checking higher up ! */ 

/* */ 

/* */ 

/* Input: */ 

/* molp - molecule pointer, molecule to place in region. */ 

/* stfldp - steric field pointer, where values will be placed. */ 

/* elfldp - electrostatic field pointer, where values will be placed. */ 

/* regp - region pointer, locations where values are to be evaluated. */ 

/* no_st - flag to skip steric evaluations */ 

/* no_el - flag to skip electrostatic evaluations */ 

/* ctp - ComfaTopPtr, for dummy/lp values */ 

/* */ 

/* Returns 0 on failure, 1 otherwise. */ 

/* */ 



/ * + E : QSAR_FIELD_EVAL_RB_ATTEN ( ) * / 

int QSAR_FIELD_EVAL_RB_ATTEN ( molp, stfldp, elfldp, regp , no_st, no_el , ctp) 

mol_ptr molp; 

FieldPtr stfldp, elfldp; 

RegionPtr regp; 

int no_st, no_el ; 

ComfaTopPtr ctp; 

{ 

BoxPtr box; 

atomjptr at, SYB_ATOM_FIND_ID ( ) ; 

int pid, b, ix, iy, iz, nat, vol_avg, repulsive ; 

fpt *steric, *elect, SYB_ATAB_VDW_RADII { ) ; 

fpt diff, dis, dis2, x # y, z, sum_steric, sum_elect ; 

fpt dis6, disl2 , repuls_val, offs[9][3], atm_ste, atm_ele; 

fpt *charge, *ctemp, *coord, *ftemp, *wt, scale_vol_avg, atm_steric, atm_elect; 
int *atyp , *itemp, dohbd, dohba, ishbd, retval, dielectric , off, atid; 
static fpt hbond_scal; 

fpt hbond_A, hbond_B, *AtWts = NIL, *QSAR_FIELD_RB_WTS ( ) ; 

int *HAs, *HDs, *HAp, *HDp; /* sets would be more efficient but slower */ 

int do_steric, do_elect; 

set_ptr hdonor, SYB_HBOND_DONORS ( ) , pset = NIL, aset = NIL; 

#define Q2KC 332,0 

^define MIN_SQ_DISTANCE 1.0e-4 

/* any atom within 10-2 Angstroms is hereby zapped ! 

this is about it: 10*6 / 10^-24 is close to overflow! */ 

ftemp = NIL; ctemp = NIL; itemp = NIL; retval = FALSE; HAS = NIL; HDs = NIL; 
hdonor = NIL; 

/* for now, make root atom the one closest to 0,0,0 */ 
for (nat = 1; nat <= molp- >natoms ; nat++) { 
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at = SYB_ATOM_FIND_ID ( molp, nat ); 

dis2 = at->xyz[0] * at->xyz[0] + at->xyz[l] * at->xyz[l] + 

at->xyz[2] * at->xyz[2); 
if (nat --l|| dis2 < dis) { 

dis = dis2; 

atid = nat; 



/+ following is specific to topomeric fields */ 
if (!(AtWts = QSAR_FIELD_RB_WTS ( molp, atid ) )) goto cleanup; 

if <!no_el) 
{dielectric = elf Idp- >dielectric ; 
vol_avg = elf Idp- >vol_avg_type; 
scale_vol_avg = elf Idp- >scale_vol_avg; 
repulsive = elf Idp- >repulsive; 

repuls_val=repexp [repulsive] ; elect = elfldp -> field_value; } 
if (!no_st) 
{vol_avg = stfldp->vol_avg_type; 
scale_vol_avg « stf Idp- >scale_vol_avg; 
repulsive = stf Idp- >repulsive; 

repuls_val=repexp [repulsive] ; steric = stfldp -> f ieldjvalue; } 

if (Mftemp = (fpt *) UTL_MEM_ALLOC (3* sizeof (fpt) *molp- >natoms) ) ) goto cleanup; 

if {!(ctemp = (fpt *) UTL_MEM_ALLOC ( sizeof ( fpt ) *molp- >natoms) ) ) goto cleanup; 

if {!(itemp = (int *) UTL_MEM_ALLOC ( sizeof (int) *molp- >natoms) ) ) goto cleanup; 

if (!(HAs = (int *) UTL_MEM_ALLOC ( sizeof ( int ) *molp- >natoms) ) ) goto cleanup; 

if (!(HDs = (int *) UTL_MEM_ALLOC { sizeof ( int ) *molp- >natoms) ) ) goto cleanup; 
/* get just those H's which are capable of Hbonding */ 

if (!(hdonor - SYB_HBOND_DONORS ( molp, NIL ) )) goto cleanup; 

for (coord=f temp, atyp=i temp, charge=ctemp , HAp=HAs , HDp=HDs , nat=l; 
nat<=molp - >natoms ; nat ++ ) 
{ if (NIL «(at = SYB_ATOM_FIND_ID(molp, nat) ) ) goto cleanup; 
*coord++ = at->xyz[0]; 
*coord++ = at->xyz[l]; 
*coord++ = at->xyz[2]; 
*atyp++ = at->type -1 ; 
*charge++ = at->charge; 

*HAp++ = SYB_ATAB_HBOND_ACCEPT(at->type) ; 
*HDp++ = UTL_SET_MEMBER ( hdonor , at->recno) ; 

} 

for (b=0; b<regp- >n_boxes ; b+ + ) { 
box = & regp->box_array [b] ; 

dohbd = (SYB_ATAB_ATOMIC_NUMBER( box- >atom_type) ==1) && 

(box- >pt_charge == 1.0); 
dohba = (SYB_ATAB_ATOMIC_NUMBER( box- >atom_type ) ==8) && 

(box- >pt_charge == -1.0); 
if (dohbd | | dohba) { 

if ( ! TAI LOR_STORE_IT_HERE ( "TAILOR ! FORCE_FIELD ! HBOND_RAD_SCALING" , 
&hbond_scal, 1)) goto cleanup; 

hbond_A = pow( hbond_scal, 6.0 ); 

hbond B = hbond_A * hbond_A; 

} 

if (vol_avg) 

QSAR_FIELD_EVAL_GETOFF (of f s,box- >stepsize , vol_avg, scale_vol_avg) ; 
if ( !no_st ) 

QSAR_FIELD_VDWTAB ( box -> atom_type, repuls_val, ctp- >du_lp_steric ); 
for (iz=0, z=box->lo[2] ; iz < box- >nstep [2 ] ; iz + + , z += box- >stepsize [2] ) 
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for {iy=0, y=box- >lo [l] ; iy < box- >nstep [1] ; iy++, y += box- >stepsize [l] ) 
for (ix=0, x=box->lo[0] ; ix < box- >nstep [0 j ; ix++, x += box- >stepsize [0] ) 

{ 

for ( coord = ftemp, charge = ctemp, atyp = itemp, HAp=HAs, HDp=HDs, 

do_steric=TRUE, do_elect=TRUE # nat=0, sum_steric = sum_elect = 0.0, 
nat<molp- >natoms; 
nat++, wt + + ) 

if ( ( *atyp == DUMMY- 1 || -*atyp « LP-l ) && ! ctp- >du_lp_elect ) 

*charge =0.0; /* set charge to 0 since ignoring Du/lp */ 
if {!vol avg) /* the "normal" case */ 

{ 

dis2 = x - *coord++ ; 

dis2 * = dis2; 

diff = y - *coord++ ; 

diff *= diff; 

dis2 += diff; 

diff = 2 - *coord++ ; 

diff *« diff; 

dis2 + = diff; 

if { !no_el && elf Idp- >zap_el==2 && do_elect) { 
dis = sqrt ( dis2 ) ; 

if ( dis < S YB_ATAB_VDW_RAD 1 1 ( *atyp+l ) ) { 
/* no shortcircuits ! */ 
/* 

*elect++ = 0.0; 
do_elect = FALSE; 

*/ 



if ( dis2 < MIN_SQ_DISTANCE ) { 
if ( !no_st ) 

/* if atom has no steric value, we don't care about 

M I N_S Q_D I STANC E since it has no contribution anyway */ 
if ( vdw_a[*atyp] != 0.0 && vdw_b[*atyp] 0.0 ) { 

/* set sterics to its max value at current grid pt . */ 

atm_steric = (*wt) * stf ldp- >max_value; 

if ( !no_el && do_elect) { 

if { !no_st ScSc !do_steric && elf ldp- >zap_el ) { 
*elect++ = DAB__F_MISSING; 

else if { *charge != 0.0 ) { 
if ( *charge > 0.0 ) 

atm_elect = (*wt) * elf ldp- >max_value; 
else atm_elect = (*wt) * -elf ldp- >max_value; 



if ( !do_elect && !do_steric ) 

break; /* break out of loop since neither el. or st . 

need to be calculated for this grid point */ 

/* setting dis2 to 1 (an arbitrary no.) will prevent a zero 

divide in the sum_steric or sum_elect calculations below */ 
dis2 = 1.0; 
} 

if { ! no_st && do_steric ) { 
dis6 = dis2 * dis2 * dis2; 
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disl2= dis6 * dis6 ; 
if (repulsive) 

disl2 o (repulsive==l) ? disl2 / dis2 : disl2 / dis2 / dis2; 
if (dohbd && *HAp) 

atm_steric = hbond_B * vdw_b [*atypj /disl2 - 
hbond_A * vdw_a [*atyp] /dis6 ; 
else if (dohba && *HDp) 

atm_steric = hbond_B * vdw_b [ *atyp] /disl2 - 
hbond_A * vdw_a [*atyp] /dis6 ; 

else 

atm_steric = vdw_b [*atyp] /disl2 - vdw_a [*atyp] /dis6 ; 
HAp++; HDp++; } 

atm_steric « atm_steric > stf ldp- >max_jvalue ? stf ldp- >max_value 

: atm_steric; 
atm_steric * = (*wt) ; 
if ( ! no_el && do_elect ) { 
atm_elect = *charge++ / 

( dielectric ? sqrt(dis2) : dis2 ) ; 
atm_elect = atm_elect > elf ldp- >max_value ? elf ldp- >max_value 
: atm_elect; 

atm_elect = atrn_elect < - (elf ldp- >max_value) ? - (elf ldp- >max_value) 

: atm_elect; 
atm_elect * = (*wt) ; 
sum elect += atm elect; 

} " 

atyp++; 

sum_steric += atm_steric; 
else 

for (off=0;off<9;off++) 



} 



coord += 3; 
atyp ++ 
charge ++ 
HAp + + 
HDp + + 

} /* atom loop */ 
doneatoms : 



if ( do_steric 
if (vol_avg) 



| do_elect ) { 
. sum_elect /= 9.0; sum_steric /= 9.0 ; } 
if ( !no_el && do_elect ) 
{ *elect = sum_elect * box-> pt_charge * Q2KC ; 

if ( *elect > elf ldp- >max_value ) *elect = elf ldp- >max_value; 
else if ( *elect < - elf ldp- >max_value ) *elect = 
- elf ldp- >max__value; 
transf orm__f ield (elf ldp- >max_value , elect , ctp) ; 
elect ++; 

} 

if ( !no_st ScSc do_steric ) 
{ *steric = sum_steric ; 

if ( *steric > stf ldp- >max_value) 
{ *steric = stf ldp- >max_value; 

if (!no_el && elf ldp- >zap_el==l ) * (elect- 1) = DAB_F_MISS ING ; } 
transform_f ield (stf ldp- >max_value, steric, ctp) ; 
steric ++ ; } 

, } 

} /* points in box loop */ 
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} /* boxes loop */ 



retval = TRUE; 
cleanup: 

if { itemp) UTL_MEM_FREE ( itemp); 
if ( ftemp) UTL_MEM_FREE ( f temp) ; 
if ( ctemp) UTL_MEM_FREE ( ctemp); 
if (HAS) UTL_MEM_FREE ( HAs ); 
if (HDs) UTL_MEM_FREE ( HDs ); 
if (hdonor) UTL_SET_DESTROY ( hdonor ); 
if (AtWts) UTL_MEM_FREE ( AtWtS ); 
if (pset) UTL_MEM_FREE ( pset ); 
if (aset) UTL_MEM_FREE ( aset ); 
return retval; 



#undef Q2KC 

#undef MIN_SQ_DISTANCE 
} 

/* 

Static fpt *QSAR_FIELD_RB_WTS { molp, rootid ) 

/* generates rotational -bond wts for each atom */ 

mol_ptr molp; 

int rootid; 

{ 

/* pseudo code for FIELD_RB_WTS ( ) 



while saw new atoms 

uncover atoms that stopped last shell growth 
grow next "rotational shell" 
while adding to shell 

for each atom in shell 
get neighbors not seen 
for each neighbor 

if bond is rotatable (acyclic, >1 attached atom, not = ,am,#) 

cover all other atoms attached to atom for this shell 
add it to shell 

*/ 



NIL, *vals = NIL, factor, nowfact = 1.0; 
found, aggcount, atid, aggid, loop, size; 

aggats = NIL, allats = NIL, nuls - NIL, endatms = NIL, end_cands 
root, SYB_ATOM_FIND_REC { ) , at, atrec ; 
b , S YB_BOND_FIND_REC ( ) ; 
toats , UTL_LIST_RETRIEVE_P ( ) ; 
cptr; 
tempString [200] ; 

ashow ( ) , qsar_f ield_attached_atoms { ) ; 

if (!( vals = (fpt *) UTL_MEM_ALLOC ( sizeof ( f pt ) *molp- >natoms) ) ) returnt NI 
if ( !UIMS2_VAR_GET_TOKEN( " TAILOR ! COMFA ! AGGREG_DESCALE " , 

&f actor ) ) return ( NIL ); 
if (! (allats = UTL_SET_CREATE ( molp- >max_atoms + 1 ) )) goto cleanup; 
if (! (aggats = UTL_SET_CREATE ( molp- >max_atoms + 1 ) ) ) goto cleanup; 
if ( ! (nuls = UTL_SET_CREATE ( molp- >max_atoms + 1 ) )) goto cleanup; 
if (! (endatms = UTL_SET_CREATE ( molp- >max_atoms + 1 ) )) goto cleanup; 
if ( ! (end_cands = UTL_SET_CREATE ( molp- >max__atoms + 1 ) )) goto cleanup; 
if (!( root » SYB_ATOM_FIND_REC( molp, rootid ) )) goto cleanup; 
UTL_SET_INSERT( aggats, root->recno ); 
UTL_SET_INSERT ( allats, root-> recno ); 
aggcount = loop = 1; 



fpt *ansr = 
int 

set_ptr 

atom_ptr 

bond__ptr 

List_Ptr 

acon_ptr 

char 

void 
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while (TRUE) { 

while (TRUE) { 
aggid = -1; 

while ((aggid = UTL_SET_NEXT( allats, aggid )) >= 0 ) { 
UTL_SET_CLEAR ( nuls ); 

qsar_f ield_attached_atoms ( nuls, molp, aggid ); 
UTL_SET_DIFF_INPLACE( nuls, allats, nuls ); 
UTL_SET_DIFF_INPLACE ( nuls, endatms, nuls ); 
/* identifying any atoms that terminate this aggregate */ 
atid « -1; 

while ((atid = UTL_SET_NEXT( nuls, atid ) ) >= 0 
if (!( at « SYB_ATOM_FIND_REC ( molp, atid ) ) )' 
/* skipping monovalent atoms */ 

if (at->nbond > -1) { 
/* find bond record that attaches to aggid */ 

toats = at->conn__atom; 
found = FALSE; 
while (toats && ! found ) { 

toats = UTL_LIST_RETRIEVE_P ( toats, &cptr, fcsize ) ; 
found = (cptr-> target == aggid ); 



) { 

goto cleanup; 



} 

if 
b 



(! found) goto cleanup; 

SYB_BOND_FIND_REC (molp, cptr- >bond_rec ) ; 
if ( !(b->Status & BOND_V__IRING) && !{b->status & BOND_V_ERI 
&& (b->type == SYB_BTAB_MNEM_TO_TYPE ( " 1 " ) ) ) { 
/* have an end -of -aggregate atom, mark as end atoms all other attached atoms */ 

UTL_SET_CLEAR ( end_cands ) ; 

qsar_f ield_attached_atoms ( end_cands, molp, at->recno ); 

UTL_SET_DELETE ( end_cands, aggid ); 

UTL SET OR INPLACE( endatms, end cands, endatms ); 



} 



} 



} 



} 



UTL_SET_OR_INPLACE ( aggats, nuls, aggats ); 

if ( UTL_S ET_C ARD INAL I T Y ( aggats ) <= aggcount ) break; 
aggcount = UTL_SET_CARDINALITY ( aggats ); 
UTL SET OR_INPLACE( allats, aggats, allats ); 



/* debugging stuff 



sprintf( tempString, "Aggregate %d (weight = %f ) 
UBS_OUTPUT_MESSAGE ( stdout, tempString ); 
ashow( aggats, molp ); 



loop, nowfact ) ; 



*/ 

/* if no atoms added, we are done! */ 

if (UTL_SET_EMPTY{ aggats )) break; 
/* record scaling factor for atoms in this aggregate */ 

atid = -1; 

while ((atid - UTL_SET_NEXT ( aggats, atid )) >= 0 ) { 

if ( ! (atrec = SYB_ATOM_FIND_REC ( molp, atid ))) goto cleanup; 
vals[ (atrec - >id) -1 ] = nowfact; 

UTL_SET_OR_INPLACE ( allats, aggats, allats ); 

UTL_SET_CLEAR ( aggats ) ; 

UTL_SET_CLEAR( endatms ); 

aggcount = 0; 

nowfact *= factor; 

loop++; 
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ansr = vals; 
cleanup: 

if (aggats) UTL__SET_DESTROY ( aggats ); 

if (allats) UTL_SET_DESTROY( allats ); 

if (endatms) UTL_SET_DESTROY ( endatms ); 

if (end_cands) UTL_SET_DESTROY( end^cands ); 

if (nuls) UTL_SET_DESTROY ( nuls ); 

return ( ansr ) ; 



static void qsar_f ield_attached_atoms ( aset, m, atid ) 

/+ ors atoms attached to atm into aset */ 

/* WORKS STRUCTLY WITH RECNOS */ 

setjotr aset; 

mol_ptr m; 

int atid; 

{ 

atomj)tr at, SYB_ATOM_FIND_ID ( ) ; 
List_Ptr tohs, UTL_LIST_RETRIEVE_P ( ) ; 
atom_ptr ton, SYB_ATOM_FIND_REC ( ) ; 
acon_ptr connl ; 
int nbytesl; 

at = SYB_ATOM_FIND_REC ( m, atid ); 
tohs = at - >conn__atom; 
while (tohs) { 

tohs = UTL_LIST_RETRIEVE_P{ tohs, &connl , &nbytesl) ; 

toh = SYB_ATOM_FIND_REC { m, connl - >target ); 

UTL SET INSERT ( aset, toh->recno ); 

} " " 

return; 

} 



static void ashow( aset, m ) 

/* for interactive debugging, shows a set's membership in terms of atom ID */ 
set_ptr aset; 
mol ptr m; 
{ 

char buff [1000] , *b; 

atomj)tr at, SYB_ATOM_FIND_REC ( ) ; 

int elem; 

*buff = '/0' ; 
b = buff; 
elem = - 1 ; 

while ( (elem = UTL_SET_NEXT ( aset, elem)) >= 0 ) { 
at = SYB_ATOM_FIND_REC (" m, elem ); 
sprintf( b, " %d", at->id ); 
b = buff + strlen{ buff ); 

} 

sprint f ( b, "\n" ) ; 

UBS_OUTPUT_MESSAGE ( stdout , buff ); 
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ft 

ft Section II-A. SPL invoked shell for computing the diagonal defining the 

ft "best" triangle, e.g., the one with the highest density of points below. 

ft 

@expression_generator LRT_FAST 



ft Usage: 

ft lrt_fast rows descriptor_cols bio_col [pis flags like scaling in quotes] 
ft rows {*) - rows to take 

# descriptor_cols - which columns are the neighborhood metrics 
ft bio_col - which column has the bio (probably log bio) data 

ft [...]- if need to SCAL NONE or anything like that, doit here 

ft 

# returns a line of the form 

# 3.09691 / 0.000546509 = 5666.71 - 496 : 496 :: 15.6981 : 15.6989 
ft * max bio difference 

ft A optimal distance division for max bio 

ft * slope 

ft A number in the lrt 

# ^total number 

#" A area in the lrt 

ft ^total area 

ft Significance is related to whether ratio of numbers is 

ft much above ratio of areas. 

ft 



globalvar SAMPLS_IN_PROGRESS DONE_CHECKED_OUT 
localvar hold distname rows cols bio 

setvar rows %promptif ( " $1 " R0W_EXP "*" "Rows to use in lrt") 

setvar cols %promptif ( "$2 " C0L_EXP " COMFA* " "Columns of mol descriptors") 

setvar bio %promptif ( "$3 " C0L_EXP "LOGBIO" "Column of bio data") 

setvar hold SAMPLS_IN_PROGRESS 
setvar SAMPLS_IN_PROGRESS $bio 

setvar distname TAILOR ! HIER !DIST_FNAME ■ 
setvar TAILOR ! HIER ! DIST_FNAME lrt_fort.3 

# here the information is computed and written to a file 

# whose name is passed in via a TAILOR value 
QSAR ANA DO I >$NULLDEV $rows $COls HIER $4 | 

setvar SAMPLS_IN_PROGRESS $hold 

isetvar TAILOR! HIER !DIST_FNAME $distname 

# contents of the file are returned to the caller 
setvar hold %system("cat lrt_fort.3") 

%return( "$hold" ) 



# 

# Section II-B. SPL script for computing the significance of the distribution 

# found by lrt_fast 
# 

@expression_generator dochi 

# computes the chi-square statistic for the number of points below 

# the diagonal, null hyptheses being the area fraction of the total. 
# 

# To be called as: %dochi ( %lrt_fast( ) ), i.e., its inputs 
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# are exactly the output of %lrt_fast as described in the lrt_fast header. 
8 

setvar expected %math{ $9 * $11 / $13 ) 
setvar sq %math( $7 - $expected ) 
setvar sq %math( $sq * $sq / $expected ) 
%return( $sq ) 



/* Section II-C. Computes the best diagonal in the "virtual graph" of biological 
distances vs property differences. */ 

int QSHELL_HIER_LRT ( table , biocol , dmat , nrow , order , lmsg) 
char * table; 

int biocol, /* column in MSS with biological data */ 

nrow, /* dimension of dmat and order */ 

*order; /* array of row IDs to consider */ 
fpt *dmat; /* distance matrix for property distances */ 
char *lmsg; /* file name for results */ 

{ 

fpt *p, *q, fabsO, bmax; 
int i,j, count, status_array ; 
char *f pt_colname; 
FILE *out, *UTL_FILE_FOPEN() ; 

/* need to get the bio values 

In the n A 2 we can repack into n(n-l)/2 then add the n bio values 
and finish with the bio distances */ 

/* 

No error handling. Better be data in those rows! 

V 

for (count=0 # i=0; i<nrow; i++) 
for (j=0; j<i; 

dmat [count++] = dmat[i*nrow + j]; 

q = p = dmat + ( (nrow-1) * nrow) / 2; 

TBL_ACCESS_INDEX_TO_COLNAME (table, biocol-1, &fpt_colname) ; 
TBL_GRAB_INIT_FPTS (table, 1, &fpt_colname ); 
for ( i=0 ; i<nrow; i++, p++) 

TBL_GRAB_GET_FPTS_INV (order [i] -1, &status_array , p) ; 
TBL_GRAB_COMPLETE_FPTS ( ) ; 

bmax = 0.0; 

for (count=0, i=0; i<nrow; i++) 
for (j=0; j<i; count++) 

if ( (p [count] = fabs(q[i] - q[j])) > bmax) bmax = p [count]; 

out = UTL_FILE_FOPEN (lmsg, "w") ; 
QSHELL_HIER_DO_LRT (out, count, dmat, p, bmax) ; 

UTL FILE FCLOSE(out); 
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21 



• 



int QSHELL - HIER_DO_LRT( out, index, xsort # ysort, bmax ) 
FILE *OUt; 

fpt *xsort, *ysort, bmax; 
int index; 

int *order, count, 3, 1, bad; 
int bestN, bestl; 
fpt den, bestDen; 

tfdefine CUTOFF ( bmax * ( xsort [order [i] ] / xsort [order [j 1 ] ) ) 

if (! (order = (int *) UTL_MEM_ALLOC ( index *sizeof(int )))) return 0; 
for (i=0; i<index; i + + ) order [i] =i; 
bestN = bestl = bad = 0; 
bestDen » 0.0; 

fpt_heapsort (index, xsort, order) ; 

for (j=0;count=0, bad=0, j<index ;j++) 

if (xsort (order [j] ] <= 0.0) continue; 
for {i=0; i<=j ; i++) 

if (ysort [order [i] ] <= CUTOFF) count ++ ; 
else bad++; 
} /* loop over all d <= this distance */ 
if ( (den = count/ bmax / xsort [order [j ] ] *2.0) > bestDen) 
{bestDen = den; bestl = j; bestN = index - bad;} 
} /* loop over all distances */ 

den = bmax * xsort [order [index- 1] ] ; 

sprintf (msg, "%g / %g = %g - %d : %d : : %g : %g\n", 

bmax, xsort [order [bestl] ] , bmax/xsort [order [bestl] ] , 
bestN, index, den-xsort [order [bestl] ] *bmax/2 . 0 , den) ; 

UBS_OUTPUT_MESSAGE (out , msg) ; 

UTL_MEM_FREE (order) ; 

return 1; 

} 
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/* n is number of elements 

arrin is array of floats to be sorted 
indx is array of ints initially 0...n-l 

*/ 

int f pt_heapsort (n, arrin, indx) 

int n; 

fpt *arrin; 

int *indx; 



int 1, ir, indxt, i, j ; 
fpt q; 

1 - n/2 ; 
ir = n -1 ; 

while (TRUE) /* the "10" loop */ 

if (1>0) { indxt = indx[--l]; q = arrin [indxt] ; } 



indxt = indx[ir]; q = arrin [ indxt] ; 
indx[ir--] = indx[0] ; 
if ( ir == 0 ) 

{ indx[0] = indxt; return 1; } /* <=== Only way out ! */ 

} 

i = 1; 
i = 1; 

j = 1 + 1 +1; 

while (j <= ir) /* the "20" loop */ 

if ( (j<ir) && (arrin[indx[j] ] < arrin [indx [j +1] ] ) ) j++ ; 
if (q < arrin [indx[j] ] ) f indx[i] - indx[j); i = j; j = j+j+1; } 



else 



else 




} 



indx[i] = indxt; 
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/* SECTION III -A. Declarations for all non-standard data structures referenced 
in the C code functions shown in Sections I and II. */ 
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*/ 
*/ 
*/ 
*/ 
*/ 
*/ 
*/ 
*/ 
*/ 



h 
I* 
/* 
/* 
/* 
/* 
/* 
/* 
/* 



Molecule and Supporting Structure Definitions 

John McAlister * 09 -Aug- 1985 

This file contains the definitions for the molecular data struc- 
tures required within SYBYL. The contents of this file are des- 
described in detail in the document "SYBYL Molecular Data Struc- 
tures" . 



Define the molecule descriptor template 
typedef struct molecule_struct { 



char 
i32 

List_Ptr 

i32 

char 

stamp 

stamp 

int 

int 

List_Ptr 

int 

int 

List_Ptr 

int 

int 

List_Ptr 
List_Ptr 
int 
int 

List_Ptr 

int 

int 

List_Ptr 
int 
fpt 
fpt 

List Ptr 



*name; 

type; 

diet ; 

status; 
* comment ; 

cre_time; 

mod_t ime ; 

max_props ; 

nprops ; 

props ; 

max_f eats; 

nf eats; 

feats; 

max_subst; 

nsubst ; 

subst; 



/* 

/* 
/* 
/* 
/* 
/* 
/* 
/* 
/* 
/* 
/* 
/* 
/* 
/* 
/* 
/* 



pointer to molecule name 
molecule type 

list of dictionaries used with molecule 
molecule status 

pointer to comment for molecule 
creation time/user/version stamp 
modification time/user/version stamp 
maximum properties currently allocated 
number of molecular properties 
pointer to list of properties 
maximum features currently allocated 
number of molecular features 
pointer to list of molecular features 



*/ 
*/ 
*/ 
*/ 
*/ 
*/ 
*/ 
*/ 
*/ 
*/ 
*/ 
*/ 
*/ 



maximum substructures currently allocated*/ 



number of substructures in molecule 
pointer to list of substructures 
subst__roots; /* pointer to list of root subst offsets 



max_atoms ; 
natoms ; 
atoms; 
max__bonds ; 
nbonds ; 
bonds ; 
charges; 
vector [3] ; 
matrix[9] ; 
assoc data; 



} molecule , *mol_j?tr ; 



/* maximum atoms currently allocated 
/* number of atoms in molecule 
/* pointer to atom array segment list 
/* maximum bonds currently allocated 
/* number of bonds in molecule 
/* pointer to bond array segment list 
/* type of atomic charges, if present 
/* translation vector for molecule 
/* rotation matrix for molecule 
/* pointer to list of associated data 
/* descriptors 



*/ 
*/ 
*/ 
*/ 
*/ 
*/ 
*/ 
*/ 
*/ 
V 

*/ 
*/ 
*/ 
*/ 



/************************* ATOM 

/* 

/* Define the atom entry record 



char 


*name; 


/* 


int 


type; 


/* 


i32 


status; 


/* 


int 


recno; 


/* 


int 


id; 


/* 


int 


link; 


/* 


int 


SUbst ; 


/* 


List Ptr 


property; 


/* 


List_Ptr 


feature; 


/* 






/* 


int 


nbond ; 


/* 



DEFINITION *******************************/ 

V 

template */ 

atom name */ 

atom type */ 

atom status */ 

cumulative atom record number */ 

atom id (logical atom number) */ 

link to next atom record */ 

offset to substructure containing atom */ 

pointer to list of properties for atom */ 

pointer to list of features including */ 

this atom */ 

number of bonds involving this atom */ 
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List__Ptr 

fpt 

fpt 

} atom, 



conn_atom; 
xyz[3] ; 
charge ; 
*atom_ptr; 



/* pointer to list of bonded atoms 
/* coordinates of atom 
/* point charge on atom 



/* Define the atom array segment descriptor template 
typedef struct atom_seg_struct { 

atom_ptr segjiead; /* pointer to head of atom array segment 
mol_ptr molecule; /* pointer to molecule containing atom seg 
int max_atom; /* maximum number of atom records in seg 

int natom; /* number of filled atom records in seg 

int used_atom; /* offset to first filled record in segment 

int free_atom; /* offset to first free record in segment 

} atom_seg # *aseg_ptr; 

/* Define the bond specifier records pointed to by the atom records 
typedef struct atom_conn_struct { 

int target; /* offset to target atom 

int bond_rec; /* offset to bond descriptor record 

} atom_conn, *acon_ptr; 



*/ 

V 
V 

*/ 
*/ 
*/ 
*/ 



*/ 

*/ 
*/ 
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/************************* BOND 
/* 

/* Define the bond entry record 
tvpedef struct bond struct 



int 


type; 




X 3 £• 




/* 


int 


recno; 


/* 


int 


id; 


/* 


int 


link; 


/* 


List_Ptr 


property; 


/* 


List_Ptr 


feature; 


/* 






/* 


int 


o_subst ; 


/* 


int 


origin; 


/* 


int 


t_subst; 


/* 


int 


target; 


/* 


} bond, 


*bond_ptr; 





DEFINITION *******************************/ 

*/ 

template 

bond type 
bond status 

cumulative bond record number 
bond id (logical bond number) 
link to empty bond record 
pointer to bond property list 
pointer to list of features including 

this bond 
offset to origin atom substructure 
offset to atom at bond origin 
offset to target atom substructure 
offset to atom at bond destination 



Define the bond array segment descriptor template 
typedef struct bond_seg_struct { 



bond_ptr 

mol_ptr 

int 

int 

int 

int 



seg_head; 
molecule; 
max_bond; 
nbond ; 
used_bond; 
free bond; 



} bond_seg, *bseg_ptr; 



/* 
/* 
/* 
/* 
/* 
/* 



pointer to head of bond array segment 
pointer to molecule containing bond seg 
maximum number of bonds in segment 
number of filled bond records in seg 
offset to first filled record in segment 
offset to first free record in segment 



*/ 

*/ 
*/ 
*/ 
*/ 
*/ 
*/ 
*/ 
*/ 
*/ 
*/ 
*/ 
*/ 



*/ 
*/ 
*/ 
*/ 
V 

*/ 
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/+ ===== comfa.h ====== */ 

/* Regions are the set of points at which energy evaluations are made */ 

/* in the CoMFA method of QSAR. A region is defined as the union */ 

/* of a set of 3D boxes (which may be a single point in the */ 

/* limit) and their associated attributes. Attributes needed for */ 

/+ CoMFA purposes are outlined below. */ 

/* */ 
/.************* ***************************^ 

#ifndef QSAR_COMFA_DEFINITIONS 

#define QSAR_COMFA_DEFINITIONS 1 

# include "ta_types.h n 

^define DUMMY 26 /* dummy atom id */ 

ttdefine LP 20 /* lone pair atom id */ 



typedef enum { 
FDENGY_UNKNOWN, 
FDENGY_ELECT, 
FDENGY_STERIC, 
FDENGY_HOMO , 
FDENGY_LUMO, 
DOCK_ELECT, 
D0CK_STA_N0HB, 
DOCK_STA_HBD, 
DOCK_STA_HBA , 
D0CK_STB_N0HB, 
DOCK_STB_HBD, 

DOCK_STB_HBA } FldEngyTyp; 

typedef enum { 
FDHD_ORIGINAL, 
FDHD_FFIT, 
FDHD_XTERN, 
FDHD_FUNC, 
FDHD_USER, 
FDHD_USR_AVG , 
FDHD_D0CK , 
FDHD_AVG , 
FDHD_SIG, 
FDHD_MAX, 
FDHD_MIN, 
FDHD_COEFF, 
FDHD_AVG_X, 
FDHD_SIG_X, 
FDHD_FLD_X, 
FDHD_ RANGE , 
FDHD_PLS_XWT # 
FDHD_PLS_XLOAD, 
FDHD_FAC_LOAD , 
FDHD_FAC_COMM , 
FDHD_FAC_ROTLOAD , 
FDHD_S IMCA_L0AD , 
FDHD_S IMCA_MODEL , 
FDHD_SIMCA DISCRIM, 
FDHD_HBD J FldHowTyp; 



typedef struct { 
fpt lo[3], 

hi[3] , 

stepsize [3] 
int nstep[3] , 

n; 

int atom_type; 
fpt pt_charge; 
fpt *weight; 
int avg_type; 
fpt avg_scale; 
int arb, 
*parb; 

} 



corner with lowest values for each axis 

it " h i - e s t " n 11 " 

increment between points 
/* derived as l + (hi-lo + epsilon) / stepsize 
/* n = product of nstepti] 

SYBYL atom type, for steric energy computation 
elemental charge at point, for electrostatics 
/* weight [n] is applied in all computations, e.g=l 
/* box of 'scale', sphere, sphere x vdw, ...? 
/* scale whose meaning derived from avg_type 
/* arbitrary int for later use 
/* •* pointer 

Box, *BoxPtr ; 



/* 
/* 
/* 



/* 
/* 



typedef struct { 
char * filename ; 
int n_boxes; 
int n_points ; 
BoxPtr box_array; 
int n_refs ; 
long when_made; 



/* 
/* 
/* 
/* 
/* 
/* 



name of the region's file (if any) 
number of boxes which make up the region 
number of points in this region altogether 
box_array [n_regions] , each one a Box 

number of CURRENT references to this memory 
creation stamp 



} Region, *RegionPtr 



typedef struct { 
char *reg_name; 
char * f ld_name ; 
RegionPtr reference; 
FldEngyTyp fid; 
int num_avgd; 
int curr_iter; 
char *mol_id; 



/* name of the region's file (if any) 
/* name of this field's file (if any) 
/* the region referenced by this field 

/* what type of field is referenced here 
/* number of fields averaged into this one 
/* number of iterations in current field fit 
/* unspecified molecule id, 

e.g. dbname/molname/alignname 



*/ 
*/ 
*/ 
*/ 
*/ 
*/ 
*/ 
*/ 
V 
*/ 
*/ 
*/ 



*/ 

*/ 
*/ 
*/ 
*/ 
*/ 



run 



*/ 
*/ 
*/ 

*/ 
*/ 
*/ 

*/ 



int n_points ; 
int zap_el ; 
fpt max_value; 
fpt * f ield_value ; 
int n_refs ; 
long when_made ; / 
int vol_avg_type ; 
fpt scale_vol_avg; 
int dielectric; 
int repulsive; 
FldHowTyp how_made; 
} Field, *FieldPtr 



*/ 
*/ 
*/ 



/* number of points in associated region */ 
/* whether electrostatics are MISSING when>max_st */ 
/* largest permitted absolute value of energy 
/* values at each point of the field 

* number of CURRENT references to this memory 

* creation stamp 
/* added these 4 items 1/30/89 DEP */ 



/* perry's way = 1 or old way = 0 */ 
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/* molecule dependent information solicited by QSAR table operations, 
passed into COMFA column field evaluations */ 



typedef struct { 
boolean already_f ield; 
char *some_name; 
char *steric_name; 
char *elect_name; 
FieldPtr sfld_p; 
FieldPtr efldjp; 



/* 
/* 
/* 
/* 
/* 
/* 



} ComfaMol, *Comf aMolPtr ; 



whether a field name exists (otherwise alignment) 
name of alignment; Nil align==use as is (!) 

name of steric field (if applicable) 

name of electrostatic field (if applicable) 
points to steric field in memory (when there) 
points to elect, field in memory (when there) 



*/ 
*/ 
*/ 
*/ 
*/ 
*/ 



/* molecule- independent information for CoMFA evaluations */ 
typedef struct { 

/* case for. volume averaging: 0, 1, 2=none, box, sphere (0) */ 



int 


vol_avg ; 


/* 


fpt 


vol_scale ; 


/* 


int 


fld_types ; 


/* 


fpt 


steric_max; 


/* 


int 


repulsive ; 


/* 


fpt 


elect_max ; 


/* 


int 


dielectric; 


/* 


int 


elect_out ; 


/* 


char 


*region_name; 


/* 



int du_lp_elect; 



int 
int 



sparei; 
spare2 ; 



0, l^^oth^tericelect. (0) 



steric repulsive exponent - 12 , 10 , or 8 (12) 
maximum electrostatic energy (30) 
case for dielectric (AS FORCE FIELD TAILOR) 
case to drop elect inside steric max: 0,1=T,F 



/* name of region used in the CoMFA computations 



(1) 



FieldPtr sweight_fld; /* 
FieldPtr eweight_fld; /* 

FldHowTyp how_done; 

int du_lp_steric; 



/* 

/* 

/* 
/* 



} ComfaTop, *ComfaTopPtr; 
#endif 



points to MEMORY field for weighting steric PLS 
points to MEMORY field for weighting elect. PLS 

/* perry's way = 1 or old way = 0 */ 
include dummies and lone pairs in steric field 
calculations */ 

include dummies and lone pairs in electrostatic 
field calculations */ 

As of 6.1comfa , this is TAILOR! COMFA! TRANSFORM*/ 
INDICATOR SCALE among other things * 



*/ 
*/ 

*/ 
*/ 
*/ 
*/ 

V 



*/ 
*/ 
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Section III-B. Functional descriptions of external procedures. 
(Routines that simply return dynamic memory to the heap are not 
described.) 

BOND_V_ERING - TRUE if bond is in an external ring. 

BOND_V_IRING - TRUE if bond is in an internal (simple) ring. 

QSAR_FIELD_EVAL_GETOFF - provides coordinates for field 
computation when "volume averaging" is being done. 

QSAR_FIELD_VDWTAB - returns steric parameters for the 
computation of the field contribution from the probe atom and each 
of the molecule atoms. 

SYB_AREA_GET_MOLECULE - returns the internal representation of 
the molecule in some area or "container", if such exists. 

SYB_ATAB_ATOMIC_NUMBER - returns the atomic number of the 
specified atom type. 

SYB_ATAB_ATOMIC_WEIGHT - returns the atomic weight of the 
specified atom type. 

SYB_ATAB_HBOND_ACCEPT - returns TRUE if the specified atomic 
type is a hydrogen-bond accepting atom. 

SYB_ATAB_VDW_RADII - returns the atomic radius of the specified 
atomic type. 

SYB_ATOM_FIND_ID - returns the internal representation of an atom 
referenced by its atom ID number (Atom IDs are guaranteed to be 
continuous but the ID of any single atom may change as atoms are 
added or deleted.) 

SYB_ATOM_FIND_REC - returns the internal representation of an 
atom referenced by its record ID number. (Atom record IDs are 
invariant but there may be "holes" in their sequence such that the 
largest record ID may be greater than the number of atoms.) 

SYB_ATOM_FIND_SET - returns the bitset of atoms corresponding to 
a list of atoms. 



A-37 




SYBJ30ND_FIND_REC - returns the internal representation of a bond 
referenced by its (invariant) record ID number. 

SYB_BTAB_MNEM_TO_TYPE - converts an ASCII representation of a 
bond type to its internal representation. 

SYB_EXPR_ANALYZE - parses a user-entered ASCII description of 
atoms (e.g., M2(<H>) for all hydrogen atoms within molecule M2) 
into internally valid representations of molecule and atoms. 

SYB_HBOND_DONORS - returns the set of IDs for atoms which are 
hydrogen-bonding hydrogens. 

TAILOR_STORE_IT_HERE - returns the current value of a user- (and 
SPL-) accessible variable. 

TBL_ACCESS_INDEX_TO_COLNAME - converts a user-provided MSS 
column ID to a column name (name is guaranteed to be a unique 
identifier). 

TBL_GRAB_COMPLETE_FPTS - done returning multiple (scalar) values 
in an MSS column to an array. 

TBL_GRAB_GET_FPTSJNV - in a multiple value retrieval, returns the 
value corresponding to a user-provided row ID. 

TBL_GRAB_INIT_FPTS - set up for returning multiple (scalar) values 
in an MSS column to an array. 

UBS_OUTPUT_MESSAGE - equivalent to fprintff) 

UIMS2_VAR_GET_TOKEN - returns the current value of a global SPL 
variable. 

UIMS2_WRITE_ERROR - writes text to the error output stream. 

UTL_FILE_FCLOSE, UTL_FILE_FOPEN - equivalent to fclose() and 
fopen(). 

UTL_LIST_RETRIEVE - returns the next element on a linked list. 
UTL_MEM_ALLOC - equivalent to malloc(). 
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UTL_SET_ANDJNPLACE- makes the first set logically equivalent to 
the second set, with only those bits that are also 1 in the third set 
becoming 1 in the first set. 

UTL_SET_CARDINALITY - returns the number of bits that are 1 in a 
particular bitset. 

UTL_SET_CLEAR - sets all bits in the set to 0. 

UTL_SET_COPY_INPLACE - makes the first set logically identical to 
the second. 

UTL_SET_CREATE - creates and returns an empty set of requested 
size. 

UTL_S ET_DELETE - sets the specified bit to 0. 

UTL_SET_DIFF_INPLACE - makes the first set logically equivalent to 
the second set, with all bits that are 1 in the third set becoming 0 in 
the first set. 

UTL_SET_EMPTY - TRUE if all bits in the set are 0. 
UTL_SET_INSERT - sets the requested bit to 1. 

UTL_SET_MEMBER - returns TRUE if the requested set bit equals 1. 

UTL_SET_NEXT - returns the identity of the next non-zero bit in a 
set. 

UTL_SET_OR_INPLACE - makes the first set logically equivalent to 
the second set, with all bits that are 1 in the third set becoming 1 in 
the first set. 

UTL_STR_CMP_NOCASE - non-case sensitive version of strcmpQ. 
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APPENDIX n B n 



/* CODE. This code implements a PHORE_LOC column type and calculates a single 
cell value (the Hydrogen Bonding Fingerprint for a molecule) within the SYBYL 
Molecular Spreadsheet. It is to be understood that other supporting code handles 
user input, user output, and disk file I/O. */ 



/* data structure for PHORE_LOC column type 
typedef 

struct PHORE { 

char *disco_fn; 
appears below */ 

int disco_in; 

char *region_fn; 

RegionPtr rgn; 

int nfuzz; 
for each PHORE feature */ 

int nbits; 
fails) */ 

} PHORE, *PPHORE; 



7 



/* user name for DISCO feature file - default 



/* 
/* 
/* 
/* 



internal flag if DISCO feature file loaded */ 
user name for defining region file */ 
internal reference to region when loaded */ 
number of extra lattice points (each direction) 



/* set length (must agree with rgn contents or EVAL 



/*+E:QSAR PROC EVAL PHORE_LOC */ 
/*********************************^^ 

/* int QSAR_PROC_EVAL_PHORE_LOC ( table name , row, colname) */ 



/ 

/* Dick Cramer 31-Jul-95 (PHORE_LOC == lattice bitset ) */ 

/* */ 

/* This module generates bitsets whose cardinality is equal to */ 

/* lattice points x 2 (# of sitepoint classes. For each */ 

/* instance of a pharmacophore point in the molecule being */ 

/* processed, the geometrically nearest (l+m)~3 bits in the */ 

/* bitset will be set to 1 (where m is user supplied) . */ 

/* *^ 

/* NOTE: this routine explicitly requires that sets begin after a */ 

/* first element that is the set size!!! */ 

/* */ 

/* Inputs */ 

/* */ 

/* Outputs */ 



/ 



* 



I 



/* User Required Definition Files */ 
/* */ 
/********************************^^ 
/*-E*/ 

int QSAR_PROC_EVAL_PHORE_LOC(tablename, row, colname) 
char *tablename, *colname; 
int row; 
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{ 

moljptr mol; 

PPHORE phr; 

int err, status, nvalid, mol_area; 

char *dum; 

set_ptr print, qsar_proc_calc_phore set(); 
FILE *fp; 



/* get the molecule */ 

if ( i TBL_UTL_GET_MOLECULE (tablename , row, FALSE, &mol) ) 

if ( UTL_ERROR_IS_SET ( ) ) {err=l; goto 

error; } 

else return FALSE; 

} 

/* get the user-provided input data */ 

if ( ! TBL__ATTR_FIND_COLUMN_A ( tablename , colname, "PROC_SUPPORT" , &dum, 

(int *)&phr) } {err=3; goto 

error ; } 

/* retrieve DISCO stuff if not yet present */ 
if ( ! phr->disco_in) { 

if ( !phr->disco^fn) {err=l; goto error;} 
/* set appropriate tailor value, then initialize DISCO */ 

sprintf( str, "SETVAR TAILOR! DISCO! FILE %s", phr->disco fn ); 

UIMS2_EXEC_C0MMAND( str ); 

UIMS2_EXEC_C0MMAND( "DISCO INIT" ); 

phr»>disco_in « TRUE; 

} 

/* retrieve region if not yet present */ 
if (!phr->rgn ) { 

if ( !phr->region_fn) {err=l; goto error;} 
if (!(phr->rgn = QSAR_REGION_RETRIEVE( phr->region_f n ) )) 
{err=4;goto error;} 

if (phr->rgn->n_boxes > 1 ) { 

sprintf( str, "WARNING: Region %s has %d boxes. Only first 
will be used. \n", 

phr->region_fn, phr->rgn->n_boxes ); 
UBS_OUTPUT_MESSAGE( stdout, str ); 

} 

phr->nbits = 2 * phr->rgn->n points; 

} 

/* evaluate this result, first the DISCO call */ 

if (!( print = qsar_proc_calc_phore_set ( mol, phr, &nvalid )) ) {err=12; 
goto error;} 

/* go store both the bitset in the MSS "Cell_Support" and the number of bits 
actually set in the "CELL", so there's something for the user to see */ 
if ( !TBL_ACCESS_X_PUT_VALUE (tablename, row, colname, "CELL_SUPPORT" , 

(int *)&print) ) {err=ll; goto error;} 
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if ( !TBL ACCESS_X_PUT_VALUE(tablename, row, colname, 

(int *) invalid) ) 



"CELL" , 
{err=ll; goto 



error; } 



return TRUE; 
error: 

sprintf (str, »QSAR_PROC_EVAL_PHORE_LOC (%d)", err) ; 
UTL_ERROR_ADD_TRACE (str) ; 
return FALSE; 



set_ptr anset = NIL, pset = NIL, SYB_FEAT_FIND_ID_SET () ; 
feat_ptr featp, SYB_FEAT_FIND_REC ( ) ; 
atom_ptr a, SYB_ATOM_FIND_REC ( ) ; 

int err, elem, sitebase, ci, xybase, boff, lt_base[3], lt_off[3], loff = 
0, hioff = 0 ; 
fpt tmp; 

BoxPtr bxptr ; 

line_ptr cdp; 

if (!( anset = UTL_SET_CREATE( phr->nbits ) )) {err = 1; goto error;} 

*nvalid = 0; 

if (phr->nfuzz) { 

loff -= phr->nfuzz / 2; 

hioff += (phr->nfuzz + 1 ) / 2; 

} 

bxptr = phr->rgn->box_array; 

xybase = bxptr->nstep[0] * bxptr->nstep[l) ; 

/* generate the DISCO sites for this molecule, which */ 
UIMS2_EXEC_C0MMAND( "ECHO %DISCO_SITES ( ) " ); 

/* become "FEATURES" + "dummy atoms" within SYBYL's molecule data 
structure */ 

pset = SYB_FEAT_FIND_ID_SET (mol , FEAT_V_LINE, 1, mol->nfeats) ; 
if (pset ) { 
e 1 em = ~ l * 

while((elem = UTL_ SET_NEXT (pset , elem) ) != N0_M0RE_ELEM) { 
if (i (featp = SYB_FEAT_FIND_REC (mol,elem))) goto error; 
if ( (featp->name[l] == 'S') && (featp->name[2] == '_')) { 
/* have an H-bonding feature, it must represent a line */ 

sitebase = f eatp->name[ 0] == 'A' ? 0 : phr->rgn->n_points; 
/* the dummy atom at the end of the line is our H-bonding locus */ 
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set_ptr qsar_proc_calc_phore_set( mol, phr, nvalid ) 



/* creates actual bitset */ 
moljptr mol; 
PPHORE phr; 
int * nvalid; 




cdp = (line_ptr) f eatp->dataptr; 

if (!(a = SYB_ATOM_FIND_REC (mol, cdp->positn) ) ) {err=2; goto 

error ; } 

for <ci = 0; ci < 3; ci++ ) { 

tmp = (a->xyz[ci] - bxptr->lo[ci] ) / bxptr->stepsize[ci] ; 
lt_base[ci] = (int) (tmp < 0.0 ? tmp - bxptr->stepsize[ci] : 

tmp ) ; 

/* cycle through all points touched by this locus that are also' within the 
region */ 

for (lt_off[0] = lt_base[0] + loff; lt_off [0] <= lt_base[0] + hioff; 
lt_off [0]++) 

if (lt_off[0] >= 0 && lt_off[0] < bxptr->nstep[0]) 

for (lt_off[l] = lt_base[l] + loff; lt_off[l] <= lt_base[l] + 
hioff; lt_off[l]++) 

if (lt_off[l] >= 0 && lt_off[l] < bxptr->nstep[l] ) 

for (lt_off[2] = lt_base[2] + loff; lt_off[2] <= lt_base[2] + 
hioff; lt_off[2]++) 

if (lt_off[2] >= 0 && lt_off[2) < bxptr->nstep[2] ) { 
boff = xybase * lt_off[2] + 

(bxptr -> nstep[0]) * lt_off[l] + 
lt_off[0] + sitebase; 
UTL_SET__INSERT( anset, boff ); 
(*nvalid)++; 

} 

} 

} 

UTL_SET_DESTROY ( pset ) ; 
} /* pset exists */ 

return ( anset ) ; 

error: 

sprintf (str, "qsar_proc_calc_phore_set (%d) " , err) ; 
UTL_ERROR_ADD_TRACE (str) ; 
return FALSE; 

} 



f This file determines the recognition of site points in Sybyl/DISCO. 

# See the SYBYL DISCO manual for detailed documentation. The defined types 



# (1) HB : the QUERY is searched in the SEARCH mode, and all occurences 

# are assigned DISCO features according to the remaining 

# specifications « the three ATOMS refer to the atom number 

# in QUERY such that the feature is DIST from the first atom 

# at bond ANGLE with the first and second atom at each of the 

# TORSIONS formed by the site point and the three ATOMS in order. 

# A sitepoint of NAME is added at these extension points, 

f — an< j — the first atom is assigned a feature complimentary 
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# 
t 

t 
# 
t 

i 

#TYPE NAME 



to the extension point (such as HBD_CO_ and RHBD_CO_) . 
(2) HBexrdif fers from HB in that the angles and torsions are replaced 
by two other arguments: whether lone pairs are part of the 
extension point placement, and which ATYPE (generally LP 
and/or H) determine the direction of the sitepoints. 



ATOMS SEARCH DIST ANGLE TORSIONS 



HB 
HB 
HB 
HB 
HB 
HB 
HB 



DS_02C2_ 
DS_03Car 
DS_03Car" 
DS_03Car" 
DS_03Car" 
DS_03Car~ 
DS 03C3 



4 2 1 NoDup 2, 
13 4 All 2.9 



All 2,< 
NoDup 
NoDup 
All 2, 



120 "0.0 180, 
119 "0.0 180.0" 
119 "0.0 180.0" 



119 
119 



"0.0 
"0.0 



2.9 
2.9 

,9 120 
NoDup 2.9 117 
Off ]HC(Any) (Any) C (Any) (Any) Any 
HB DS_N3C3_ 14 5 NoDup 2.9 110 
HB DS_02S_ 3 2 1 All 2.9 120 "0. 
#TYPE NAME ATOMS SEARCH DIST LP 

HBex DS_03C3_ 2 13 NoDup 2.9 YES "LP 
0[f]HC(Any) (Any) Z{ Z:Hev& ! C(Any) (Any) Any} 
HBex DS_03C3_ 3 12 NoDup 2.9 YES "LP" 
HBex DS_N3C3_ 2 14 Nodup 2.9 "" "H" 
N [ f ] H2 YaZ { Z : Hev& ! C }{ Ya : C& ! C=0& ! C : He v } 
HBex DS_N3C3_ 2 13 NoDup 2.9 YES "LP H" 
HBex DS_N3C3_ 3 12 NoDup 2.9 YES "LP" 
N[f ] (Ya) (Ya) Ya{Ya:C& !C=0& !C:Hev} 



180 



180.0" 



QUERY 

0" HevC(Any)=0[f ] 
0[f ]HC( :Hev) :Hev 
0[f ]C( :Hev) :Hev 
0" 0[f]HC(=0) 



"0.0 180.0" C( 
"60 180 300" 



0[f]C(«0) 
:0[f]):0[f] 



"60 180 300" N[f]H2ZC{Z: 
0 180" AnyS(=0) (=0)NH 
ATYPE Query 



C&!C=0&!C:Hev} 



H" 



0[f ] (Z) Z{Z:C&!C=Het} 



N [ f ] H ( Ya) Ya{ Ya : C& ! C=0&!C:Hev} 



HBex 

HBex 

HBex 

HBex 

HBex 

HBex 

hb 

hb 

hb 

hb 

hb 

hbex 
hb 
hb 
# 

HB 
HB 



DS_N2C2 

DS_N2C2" 

DS_N2C2' 

DS_N2N2~ 

DS_N2N2" 

DS_N2N2" 

DS_03S_" 

DS_03S_ 

DS_03S_ 

DS_03N_ 

DS_02N_ 

DS_N2N2 

DS_03P_" 

DS 03P 



2 13 NoDup 3.0 
12 3 NoDup 3.0 
12 3 NoDup 3.0 
2 13 NoTriv 3 . 

2 13 NoTriv 3. 

3 2 1 NoDup 3. 
2.9 



NoDup 
All 2.9 
All 2.9 
All 2.9 
NoDup 

1 NoDup 

2 All 2 
2 All 2 



YES 
YES 
YES 
,0 YES 
0 YES 
,0 YES 
128 
128 "0 
128 "0 
128 "0 

128 
0 YES 



"H LP" N[f]H=C 
"H LP" Any~N[f]=C 
"LP" Any~N[r]=C[r] 
"LP H" N[l]H:C:C:N[f ] 
"LP H" N[l]H:C:C:N[f ] 
"LP" C:N[f]:Hev 
"0.0 180.0" 



:C:@1 
:C:§1 



0 180.0" 
0 180.0" 
0 180.0" 
"0,0 180. 
"LP" 



0" 



128 
128 



"0, 
"0. 



180.0" 
180.0" 



#CLASSNAMES# Acceptor_site Donor_Atom DL 



HevS=0[f ] 
HevS(=0[f ])=0[f ] 
HevS(~0[f ]) (~0[f))- 
HevN(0[f ] )0[f ] 
HevN(Hev) ~0[f ] 
N:N[f ] :N 
P(~0) (-0) (-0) (-0) 
P(-O) (-0) (-0) 



•o[f] 



0 180.0" 0[f]HC(; 
60 180 300" 



AS_H03C2_ 13 4 All 2.9 119 "0, 
AS_H03C3_ 13 6 NoDup 2.9 117 1 
0[f ]HC(Any) (Any) C (Any) (Any) Any 
HB AS_N3C3_ 14 7 NoDup 2.9 
N[f]H2C(Any) (Any)C(Any) (Any)Any 
HB AS_N3C3_ 15 8 NoDup 2,9 
N[f ]H3C(Any) (Any)C(Any) (Any)Any 

#TYPE NAME ATOMS SEARCH DIST LP ATYPE Query 



Hev) :Hev 



110 "60 180 300" 



110 "60 180 300" 
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HBex AS_HN2C2 
HBex AS_HN2C2" 
HBex AS_HN2C2" 
HBex AS H03C3" 



"H" NHC(Any)=0[f ] 
YES "LP H" C:N[f]H:Hev 



YES "LP" N[l]H:C:C:N[f ] :C:§1 
YES "LP H" 



2 13 NoDup 3.0 
3 2 1 NoDup 3.0 
6 5 4 NoTriv 3.0 
_ _ 2 1 3 NoDup 2.9 

0[f]HC(Any) (Any)Z{Z:Hev&!C(Any) (Any) Any} 

HBex AS_HN2C2_ 3 2 4 Nodup 3.0 YES "LP H" HevN[f]H=C 
HBex AS_HN2C2_ 12 3 Nodup 3.0 YES "LP" HevN[f]=C 
HBex AS_HN2C2_ 2 14 Nodup 3.0 "" "H" N[f]H2C(N)=N 
HBex AS_N3C3_ 2 14 Nodup 2.9 YES "LP H" 
N[f]H2C(Any) (Any)Z{Z:Hev&!C(Any) (Any) Any} 
HBex AS_N3C3_ 2 15 Nodup 2.9 YES "LP H" 
N [ f ] H3C (Any) (Any) Z { Z : Hev& I C ( Any ) (Any) Any } 



HBex AS_N3C3_ 
HBex AS_N3C3_ 
HBex AS_N3C3_ 
HBex AS_N3C3_ 
HBex AS_HN2C2_ 
HBex AS_HN2C2_ 
HBex AS_HN2C2_ 
HBex AS_HN2C2_ 
HBex AS_HN2C2_ 
HBex AS_HNS3_ 
HBex AS_HN4_ 2 
hbex AS_HN2N2_ 
hb AS_03P_ 
hb AS 03P 



NoDup 
NoDup 
NoDup 
NoDup 
3 NoDup 
NoDup 
NoDup 
NoDup 
NoDup 
6 5 2 NoDup 
1 3 NoDup 



9 
9 
9 
9 
3 
3 
3 
3 
3 
3.0 
-3.6 "" 



3 2 1 NoDup 3.0 
3 12 All 2.9 128 
3 12 All 2.9 128 



YES "LP H" N[f ]H(Ya)Ya{Ya:C&!C=0&!C:Hev} 
YES "LP H" N[f ]H2(Ya)Ya{Ya:C&iC=0&!C:Hev} 
YES "LP H" N[ f }H(Ya) (Ya) Ya{ Ya : C& ! C=0& ! C:Hev} 
YES "LP" N[f ] (Ya) (Ya) Ya{Ya:C&!C=0&!C:Hev} 
0 YES "H LP" N[f]H=C 
YES "LP" N[f]=C~Any 

N[f ]H2Hev( :Hev) :Hev 
N[f )HHev(:Hev) :Hev 
HNC=Any 
AnyS(=0) (=0)N[f ]H 
N[f ] (Z) (Z) (Z) Z{Z:C&iC=0&iC:Hev} 
N:N[f ] :N 
P(-O) (-0) (-0) (-0) 



III llpjll 
III II^JII 

Ml M^JII 

t ll H tl 

"C*" 

YES "LP" 
"0.0 180.0" 
"0.0 180.0" 



P(~0) (-0) (-O) 
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APPENDIX "C 



EXPERIMENTAL DATA SETS 



Data Set 

1 Uehling 

2 Strupczewski 

3 Siddiqi 

4 Garrattl 

5 Garratt2 

6 Heyl 

7 Cristalli 

8 Stevenson 

9 Doherty 

10 Penning 

11 Lewis 

12 Krystek 

13 Yokoyamal 

14 Yokoyama2 

15 Svensson 

16 Tsutsumi 

17 Chang 

18 Rosowsky 

19 Thompson 

20 Depreux 



No. Of Cpds 
9 

34 
10 
10 
14 
11 
32 

5 

6 
13 
7 

30 
13 
12 
13 
13 
34 
10 
8 

26 



Structure. Activity 
camptothecin, DNA fragmentation 
benzisoxazoles, ip Behavioral 
adenosines, Brain Al binding 
tryptamines, melanophore binding 
tryptamines, melanophore binding 
deltorphin, opioid receptor (DAMGO) 
adenosines, A2a agonists 
piperidines, NK1 antagonism 
triarylbutenolides, endothelin-A an tag. 
SC-41930 analogs, LTB4 antagonism 
oxazolinediones, NK1 binding 
sulfonamides, endothelin-A antagonism 
oxamic acids, T3 binding 
oxamic acids, T3 binding 
benzindoles, 5-HTA agonism 
peptidyl heterocycles, endopeptidase inhib 
biphenyl sulfonamides, ATI binding 
trimetrexate analogs, DHFR inhibition 
peptidomimetic, HIV-1 protease inhibition 
naphthylethyl amides, melatonin dispL 



Literature References for Data Sets: 

1. Uehling, D.E., Nanthakamur, S.S., Croom, D., Emerson, D.L., Leitner, P.P., 
Luzzio, M.J., et al., Synthesis, Topoisomerase I Inhibitory Activity, and in Vivo 
Evaluation of 11-Azacamptothecin Analogs. J. Med. Chem. 1995, 38, 1106 (Table 2, 
with R 2 =Et; IC 50 data. 

2 Strupczewski, J.T., Bordeau, K.J., Chiang, Y., Glamkowski, E.L, Conway, P.G., et 
al 3-[[(aryloxy)alkyl]piperidinyl]-l,2-Benzisoxazoles as D2/5-HT2 Antagonists with 
Potential Atypical Antipsychotic Activity: Antipsychotic Profile of Iloperidone 
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(HP873). J. Med. Chem. 1995, 38, 1119. (Tables 2 and 3 with n=3, X=0; ED J0 for 
inhibition of apomorphine-induced climbing.) 

3. Siddiqi, S.M., Jacobson, K.A., Esker, J.L., Olah, M.E., Ji, Xi.-duo., et al, Search 
for New Purine- and Ribose-Modified Adenosine Analogs as Selective Agonists and 
Antagonists at Adenosine Receptors. J. Med. Chem. 1995, 38, 1174. (Table 1, 
R 2 =H; K,(A1), values estimated from % displacement and stereoisomers averaged as 
needed.) 

4. Garratt, P. J., Jones, R., Tocher, D. A., Sugden, D., Mapping the Melatonin 
Receptor. 3. Design and Synthesis of Melatonin Agonists and Antagonists Derived 
from 2-Phenyltryptamines. J. Med. Chem. 1995, 38, 1132. (Table 1 and Table 2). 

5. Garratt, P. J., Jones, R., Tocher, D. A., Sugden, D., Mapping the Melatonin 
Receptor. 3. Design and Synthesis of Melatonin Agonists and Antagonists Derived 
from 2-Phenyltryptamines. J. Med. Chem. 1995, 38, 1132. (Table 1 and Table 2). 

6. Heyl, D.L., Dandabuthla, M., Kurtz, K.R., Mousigian, C. Opioid Receptor Binding 
Requirements for the &-Selective Peptide Deltorphin I: Phe 3 Replacement with Ring- 
Substituted and Heterocyclic Amino Acids. J. Med. Chem. 1995, 38, 1242. (Table 1; 
binding K, to DAMGO.) 

7. Cristalli, G., Camaioni, E., Vittori, S., Volpini, R., Borea, P.A., et al. 2-Aralkynyl 
and 2-Heteroalkynyl Derivatives of Adenosine-5'-N-ethyluronamide as Selective A2a 
Adenosine Receptor Agonists. J. Med. Chem. 1995, 38, 1462. 

8. Stevenson, G.I., MacLeod, A.M., Huscroft, I., Cascieri, M.A., Sadowski, S., 
Baker, R. 4,4-Disubstituted Piperidines: A New Class of NK, Antagonist. J. Med. 
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Chem. 1995, 38, 1264. (Table 1.) 

9. Doherty, A.M., Patt, W.C., Edmunds, JJ. Berryman, K.A., Reisdorph, B.R., et al. 
Discovery of a Novel Series of Orally Active Non-Peptide Endothelin-A (ETJ 
Receptor-Selective Antagonists. J. Med. Chem. 1995, 38, 1259. (Table 3; IC 50 ET A .) 

10. Penning, T.D., Djuric, S.W., Miyashiro, J.M., Yu, S., Snyder, J.P., et al. Second- 
Generation Leukotriene B 4 Receptor Antagonists Related to SC-41930; Heterocyclic 
Replacement of the Methyl Ketone Pharmacophore. J. Med. Chem. 1995, 38, 858. 
(Table 1, all; LTB 4 receptor binding.) 

11. Lewis, R.T., MacLeod, A.M., Merchant, K.J. Kelleher, F., Sanderson, I., et al. 
Tryptophan-Derived NK1 Antagonists: Conformational^ Constrained Heterocyclic 
Bioisosteres of the Ester Linkage. J. Med. Chem. 1995, 28, 923. 

12. Krystek, S.R., Hunt, J.T., Stein, P.D., Stouch, T.R. 3D-QSAR of Sulfonamide 
Endothelin Inhibitors. J. Med. Chem. 1995, 38, 659. 

13. Yokoyama, N., Walker, G.N., Main, A.J. Stanton, J.L. Morrissey, M., et al. 
Synthesis and SAR of Oxamic Acid and Acetic Acid Derivatives Related to L- 
Thyronine. /. Med. Chem. 1995, 38, 695. 

14. Yokoyama, N., Walker, G.N., Main, A.J. Stanton, J.L. Morrissey, M., et al. 
Synthesis and SAR of Oxamic Acid and Acetic Acid Derivatives Related to L- 
Thyronine. /. Med. Chem. 1995, 38, 695. 

15. Haadsma-Svensson, S.R., Svensson, K., Duncan, N., Smith, M.W., Lin, Ch.-H. C-9 
and N-Substituted Analogs of cis-(3aR)-(-)-2,3,3a,4,5,9b-Hexahydro-3-propyl-lH- 
benz[e]indole-9-carboxamide: 5HT1A Receptor Agonists with Various Degrees of 
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Metabolic Stability. J. Med. Chem. 1995, 38, 725. 

16. Tsutsumi, S., Okonogi, T. Shibahara, S., Ohuchi, S., Hatsushiba, E., et al, 
Synthesis and Structure Activity Relationships of Peptidyl @-Keto Heterocycles as 
Novel Inhibitors of Prolyl Endopeptidase. J. Med. Chem. 1994, 37, 3492. (Table 2, 

X = CH2CH25IC5Q.) 

17. Chang, L.L., Ashton, W.T., Flanagan, K.L., Chen, Ts.-Bau., O'Malley, S.S., et al, 
Triazolinone Biphenylsulfonamides as Angiotensin II Receptor Antagonists with High 
Affinity for Both the AT, and AT 2 Subtypes. J. Med. Chem., 1994, 37, 4464. (Table 
1, R 3 =(2-Cl)C 6 H 5 ; AT, [rabbit aorta] IC 50 .) 

18. Rosowsky, A., Mota, C.E., Wright, J.E., Queener, S.F., 2,4-Diamino-5- 
chloroquinazoline Analogs of Trimetrexate and Piritrexim: Synthesis and Antifolate 
Activity. J. Med. Chem. 1994, 37, 4522. (Table 2; rat liver IC S0 .) 

19. Thompson, S.K., Murthy, K.H.M., Zhao, B., Winborne, E., Green, D.W., et al. 
Rational Design, Synthesis, and Crystallographic Analysis of a Hydroxyethylene- 
Based HIV-1 Protease Inhibitor Containing a Heterocyclic Pl'-P2' Amide Bond 
Isostere. J. Med. Chem. 1994, 37, 3100. (Table 2, X-Boc; apparent K;.) 

20. Depreux, P., Lesieur, D., Mansour, H.A., Morgan, P., et al. Synthesis and 
Structure-Activity Relationships of Novel Naphthalenic and Bioisosteric Related 
Amidic Derivatives as Melatonin Receptor Ligands. /. Med. Chem. 1994, 37, 3231. 
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APPENDIX "D" 

A list of 736 commercially available thiols broken down into 231 clusters based on topomeric 
CoMFA field descriptors along with the systematic name applicable to each. The 231 clusters 
are sorted by proposed name, first by the "root" structure, ie., the fragment attached 
immediately to the -SH, and then by the substitution pattern on that "root" substructure. The 
names describe topologically equivalent hydrocarbons, ie., structures in which all monovalent 
atoms are replaced by hydrogens and the other atoms by carbons. 
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Cluster 


Cluster 


Struct . 


Structural 


ID 


Size 


Root • 


Substitution 3 


======= 


======= 


====== 




1 


26 


aryl 


Simple 


144 


1 


aryl 


2,3,5 -Me 


177 


1 


aryl 


2 , 3 , 5-Me-4-Pr 


163 c 


1 


aryl 


2 , 3- (4- (2 , 3-Pr) 5het) 5het0 


151 


1 


aryl 


2, 3- (4-Bu) 5hetO-5-Me 


33 


5 


aryl 


2 , 3-Benzo 


80 


2 


aryl 


2, 5-Me 


192 


1 


aryl 


2, 5-Me-3-iPe 


7 


14 


aryl 


'2,6-NoH-3(4/5)-Me 


27 


6 


aryl 


2, 6-NoH-3-Ar 


107 


2 


aryl 


2- (2-Bz) PheEt-4, 5-Benzo 


189 


1 


aryl 


2- (3, 5-Me) Ar-4, 5-Benzo 


141 


1 


aryl 


2-(4-Et)PhePr 


205 


1 


aryl 


2- (4-Stilbenyl) Stilbenyl 


188 


1 


aryl 


2-5hetCH2-4, 5-Benzo 


56 


3 


aryl 


2-Ar 


138 


1 


aryl 


2 -Ar- 3, 5-Me 


190 


1 


aryl 


2-Ar-4,5- (3,4-Et)Benzo 


41 


6 


aryl 


2-Ar-4, 5-Benzo 


152 


1 


aryl 


2-Bz 


16 


9 


aryl 


2-Et 


85 


2 


aryl 


2-NoH-3-Et-5-Me 


106 


2 


aryl 


2-PheEt-4 # 5-Benzo 


77 


2 


aryl 


2-PhePr 


142 


1 


aryl 


2-R8 


121 


2 


aryl 


2 -Stilbenyl 


97 


2 


aryl 


3 , 4- (3-Me) Benzo 


218 


1 


aryl 


3 , 4- (a,b) IndenO 


164 


1 


aryl 


3,4-(a,b, (8-Ar) IndenO) -6-Me 


98 


2 


aryl 


3,4-(a,b, (c-Me) IndenO) 


99 


3 


aryl 


3,4- <a,b-Naphtho) 


157 


1 


aryl 


3,4-Ar 


58 


3 


aryl 


3 , 4-Benzo-5-Me 


100 


2 


aryl 


3 , 4-Benzo-6-tBu 


37 


5 


aryl 


3, 5-Me 


180 


1 


aryl 


3- (2 , 3-Benzo-4-Et) 5het 


199 


1 


aryl 


3- (2,3-Benzo-5-Me) 5het 


182 


1 


aryl 


3- (2-Me-3-5het-5-Et) 5het 


115 


2 


aryl 


3-(3-5het)5het 


193 


1 


aryl 


3-(3-Ar)5het-4-Me 


67 


3 


aryl 


3-Ar 


129 


2 


aryl 


3-Ar-4-(2-Me)5hetCH2 


46 


4 


aryl 


3-Ar -5 -Me 


155 


1 


aryl 


3-Bz 


82 


2 


aryl 


3-Bz-5, 6-Benzo 


10 


16 


aryl 


3-Me 




70 


3 


aryl 


73 


3 


aryl 


95 


2 


aryl 


88 


2 


aryl 


81 


2 


aryl 


48 


4 


aryl 


2 


23 


aryl 


92 


2 


aryl 


90 


4 


aryl 


19 


8 


aryl 


148 c 


1 


aryl 


228 


1 


aryl 


12 


10 


5het 


50 


4 


5het 


139 


1 


5het 


89 


2 


5het 


173 


1 


5het 


69 


3 


5het 


198 


1 


5het 


174 


1 


5het 


171 


1 


5het 


170 


1 


5het 


123 


2 


5het 


22 


7 


5het 


202 


1 


5het 


122 


2 


5het 


197 


1 


5het 


6 


14 


5het 


225 


1 


5het 


224 


1 


5het 


63 


3 


5het 


178 


2 


5het 


72 


3 


5het 


40 


5 


5het 


183 


1 


5het 


64 


3 


5het 


105 


2 


5het 


160 


1 


5het 


146 


1 


5het 


203 


1 


5het 


126 


2 


5het 


17 


9 


5het 


211 c 


1 


5het 


124 


2 


5het 


28 b 


6 


5het 


30 


6 


5het 


204 


1 


5het 


79 


2 


5het 


78 


2 


5het 


117 


2 


5het 


186 


1 


5het 


68 


3 


5het 


112 


2 


5het 




3-Naphth 
3-Pr-4-sBu-6-Me 

3- iPr 

4- Ar 
4-Bz 
4-Et 
4 -Me 
4-R9 + 
4-iBu 
6-NoH 

(adenosine) 
( fluorescein) 
Simple 

2,3-(a,b-Naphtho) 
2,3-5hetO-4-Me 
'2, 3-Ar 

2-(2,5-Et)Ar-3-Et 

2- (2-Me) Ar-3- (2-Me) PheEt 

2- (2-Me)Ar-3-Rl0 

2-(2-sBu)-3-Et 

2-(3,5-Me)Ar-3-5het 

2- (3 , 5-Me) Bz-3 , 4-Benzo 

2- (3-Et) Ar-3-Bz 

2-(4-Et)Ar 

2- (4-Et)Ar-4- (4-Me) Ar 
2- (4-iPr)Ar-3-Bz 
2-5hetCH2-3- (4-tBu) Ar 
2-Ar 

2-Ar-3- (2-Ar) ShetBu 
2-Ar-3- (2-Ar) 5hetCH2 
2-Ar-3- (2-Bz)Ar 
2-Ar-3- (2-Me) 5het 
2-Ar-3- (3,4-Et)Bz 
2-Ar-3-(3-Ar)5HetEt 
2 -Ar-3- (3-Ar) PhePr 
2-Ar-3- (3-Ar-5-Me) 5het 
2-Ar-3-(3-Me)Ar 
2 -Ar-3- (4-Ar)Cyhx 
2 -Ar-3- (4-Ar)CyhxCH2 
2-Ar-3- (4-PheEt) Ar 
2-Ar-3- (tBu) Ar 
2-Ar- 3-Ar 

2 -Ar-3 -Benzyl idene 
2-Ar-3-IndenCH2 

2 -Ar-3 -Me 

2-Ar-3-PhePr 

2-Ar-5-(4- (2,4-Me)Bz)Ar 

2-Bz 

2-Bz-3 , 4-Benzo 
2-Cyhx 

2-Cyhx-3 , 4-iPe 
2-Et 

2-Et-3- (2-Me) PheEt 
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128 


2 


5het 


93 


2 


5het 


61 


3 


5het 


181 


1 


5het 


49 


4 


5het 


86 


2 


5het 


91 


2 


5het 


4 


17 


5het 


172 


1 


5het 


38 


5 


5het 


13 


10 


5het 


222 


1 


5het 


66 


3 


5het 


29 


6 


5het 


71 


3 


5het 


108 


2 


5het 


127 


2 


5het 


54 


3 


5het 


221 


1 


5het 


187 


1 


5het 


143 


1 


5het 


96 


2 


5het 


162 


1 


5het 


169 


1 


5het 


94 


2 


5het 


210 


1 


5het 


36 


15 


5het 


176 


1 


5het 


196 


1 


5het 


159 


1 


5het 


42 


4 


5het 


200 


1 


5het 


113 


2 


5het 


125 


2 


5het 


191 


1 


5het 


145 


1 


5het 


114 


2 


5het 


18 


8 


5het 


59 


3 


5het 


65 


3 


5het 


24 


7 


5het 


44 


6 


5het 


. 52 


5 


5het 


111 


2 


5het 


153 


1 


5het 


32 b 


6 


5het 


223 


1 


5het 


185 


1 


5het 


34 


5 


alkyl 


104 


2 


alkyl 


62 


3 


alkyl 


3 


18 


alkyl 


14 


9 


alkyl 




2-Me-3,4-(3 -Me ) Benz o 

2-Me-3 ,4-Benzo 

2-Me-3-(2,3,4-Me)5het 

2-Me-3- (2, 3-Benzo-4-Et) 5het 

2-Me-3-(3-Ar)5het 

2-Me-3- (3-Ar) 5hetPr 

2-Me-3- (3-Ar-5-Me) 5het 

2-Me-3-(3-Bz)Ar 

2-Me-3- (4-tBu) PheEt 

2-Me-3-5Het 

2-Me-3-Me 

2-Me-3-Pe 

2-Me-3-PheEt 

2-Me-3-PhePr 

2-Me-3-R8+ 

-2-Me-5-Bu 

2-Pe-3-Ar 

2-Pr 

2-R12 

2-iBu-3,4-iPe 

2- iPe-3 ,4-Benzo 
3,4-(2,4-Me)Benzo 
3 , 4- (3-Ar) Benzo 
3,4- (3-Hx) Benzo 
3,4-(3-Pr)Benzo 
3,4- (a,b-Napththo) 
3 , 4-Benzo 

3- (2,4-Me)Bz 
3-(3,5-Me)Ar 
3-(3-Ar)5het 
3-(3-Bz)Ar 
3-(3-Me)PheEt 
3- (4-Me)Ar 

3- (4-tBu) Ar 

3-(Al-4-Et)PheEt 

3- (B-Ar) PhePr 

3-5hetCH2 

3-Ar 

3-Ar(2^thia) 
3-Bu 

3-Me-5-H 

3-Me-5-NoH 

3-Pe 

3 -PheEt 

3-PhePr 

3-Pr 

3-R13 

(chrysenO) 

Simple 

(3) (Bl) (Bl) 

(3-Me)PhePr 

(3:4) 

(3:4) (Al) 



60 


3 


alkyl 


226 


1 


alkyl 


45 


4 


alkyl 


35 


7 


alkyl 


168 


1 


alkyl 


47 


4 


alkyl 


179 


1 


alkyl 


103 


2 


alkyl 


76 


2 


alkyl 


83 


2 


alkyl 


216 


1 


alkyl 


43 


8 


alkyl 


5 


15 


alkyl 


158 • 


1 


alkyl 


140 


1 


alkyl 


166 


1 


alkyl 


53 


3 


alkyl 


207 


1 


alkyl 


8 


13 


alkyl 


206 


1 


alkyl 


75 


3 


alkyl 


136 


1 


alkyl 


20 


8 


alkyl 


39 


7 


alkyl 


154 c 


1 


alkyl 


230 


1 


alkyl 


131 


2 


alkyl 


15 


9 


alkyl 


137 


1 


alkyl 


231 


1 


alkyl 


229 


1 


alkyl 


184 


1 


alkyl 


227 c 


1 


alkyl 


214 c 


1 


alkyl 


23 


7 


alkyl 


74 


3 


alkyl 


25 b 


6 


alkyl 


11 


10 


benzyl 


102 


2 


benzyl 


57 


3 


benzyl 


217 


2 


benzyl 


213 


1 


benzyl 


212 


1 


benzyl 


9 


13 


benzyl 


84 


2 


benzyl 


132 


2 


benzyl 


130 


2 


benzyl 


134 


2 


benzyl 


21 


7 


benzyl 


26 b 


6 


benzyl 


156 


1 


benzyl 


201 


1 


benzyl 


135 


2 


alkenyl 



(3:4) (Bl) 

(4) (Al) (A-tBu) (CI) (CI) 

(4) (Dl) (Dl) 
(4-Me)PhePr 
(4-iPe)PhePr 

(5) (Al) 

(5) (Bl) (E- (2-Ar-5-Me) 5het) 

(5) (B3) 

(5) (CI) (CI) 

(5) (C2) 

(5) (C2) (D2) (D2) 
(5:6) (D1/B1/F1) 
(5:7) 

(6) (B8) (CI) (El) (El) 

(6) (F-Ar) 
-(7) (A8) (Fl) 

(7) (D3) (D3) 

(8) (C3) 
(8:11) 

(9) (B4) (G3) 

(10) (Bl) (E5) (El) 
(10) (CI) (E5) (E2) 
(10+) (Bl) 

(11+) (Bl) 
(12) (A-PheEt) 
(12) (F6) (Fl) 

(12) (F6) (F6) 
(12 + ) 

(13) (E4) 
(A-Ar) (A-Ar)Bz 
(A-Bz) (A-Bz)PheEt 
(Al)PheEt 
(cholesterol) 
(cryptate) 
PheBu 

PheEt 
PhePr 
Simple. 

2.4.5- Me 

2.4.6- Me 

2-(3-(2-Et)Ar)Ar 

2-Et-3- (2, 3-Et-5-Me)Ar-5-Me 

2- R8-3-Naphthyl-4, 5-Benzo 
2/3 -Me 

3 . 4- Benzo 

3. 5- Me 

3- (4-Stilbenyl) Stilbenyl 

4- (3-Ar)Ar 
4-Et 

4 -Me 

4-PhePr 

4-tBu 

Ar. . (2-Et)Ar • 




220 


1 


alkenyl 


Ar . . (4-Bz)Ar 


116 


2 


alkenyl 


Ar . .Ar 


133 


2 


alkenyl 


Ar. .Bz 


110 


2 


alkenyl 


Et.CN.C0NH2 


87 


2 


alkenyl 


NH2.CN. N=NPh 


119 


2 


alkenyl 


P(NMe2)3 . .Ar 


120 


2 


alkenyl 


P(Pr)3. .Ar 


118 


2 


alkenyl 


P(iPe) 3 . .Ar 


51 


4 


alkenyl 


PCyhx3 . . Ar 


195 c 


1 


alkenyl 


PEt3. . (2-Bz)Ar 


3 lb 


6 


1 alkenyl 


PEt3 . .Ar 


194 


1 


alkenyl 


PEt3. .Bz 


109 


2 


alkenyl 


PheEt.CN.C0NH2 


101 


2 


cyclohexyl 


Simple 


149 


1 


cyclohexyl 


l-Me-2,4-CMe2 


55 


3 


cyclohexyl 


'2,3,4, 5-iBu 


147 


1 


cyclohexyl 


2,3,4-iBu-5-iPe 


209 


1 


cyclohexyl 


2- (3 , 4-PheEt) 5het-6-Me 


208 


1 


cyclohexyl 


2-Me-3,5-CMe2 


167 


1 


cyclohexyl 


2-Me-4-sPe 


165 


1 


cyclohexyl 


2-iPr-3, 5-Me 


150 


1 


cyclohexyl 


3-sPe-6-Me 


161 


1 


cyclohexyl 


4-Et-4-iBu 


219 


1 


cyclohexyl 


(complex) 


175 


1 


cyclopentyl 


2-Ar-4-spiro 


215 


1 


cyclopentyl 


3-PhePr 



a To generate these names, all heteroatoms are first replaced bv 
carbon (to produce the simplest common topology) and a particular 
structure is chosen from among these topologies as the "most typical" 
of that cluster, if possible to contain the largest substructure that 
distinguishes that cluster from all others. 

Within the name of a substitution, numbers indicate positions when 
substitution is on a ring, but chain length when substitution is on a 
chain (numbers separated by a colon indicate a range of chain 
lengths). Also, within a chain, letters indicate a position of 
substitution. (For example, (C2) describes a two atom branching from 
the third position of a chain, while 3-PhePr describes a phenyl 
propyl skeleton attached to the 3-position of a ring. ) 

A dot notation (.) separates the three possible substituents on an 
alkenyl root, the substituent order being same carbon as the -SH 
substituent, then the position trans to the -SH, and finally cis to -SH. 

The above notwithstanding, any name enclosed completely in 
parentheses takes its usual structural meaning. 



# 



Here are structural descriptions for each name abbreviation in the 
above table, mostly in SLN (SYBYL Line Notation), listed 
alphabetically. (SLN extends SMILES with the following concepts, 
among others. Hydrogens are explicit. Ring openings and closures 
begin with a number enclosed by [] and end with the matching 
number preceded by @. Other SLN symbols used in these SLN 
definitions are: ~ = any bond; - = single bond (used here to provide a 
reference for [R]) : = aromatic bond; ! = the SLN following (here in 
parentheses) is not allowed; [F] = no additional atoms may be 
attached to the preceding atom; [!R] = preceding bond may not be in 
a ring; [R] = preceding bond must be in a ring.) 

5het = 5Het = C[1]:C:C:C:C:&1. alkenyl = C=C. alkyl = C~[!R]C. aryl = 
Ar = Phe = Ph = C[1]:C:C:C:C:C@1. benzyl = Bz = HSC-[!R]C~[R]C. Bu = 
C-[!R]C-[!R]C-[!R]C-[!R]C. cyclohexyl = Cyhx = C[1](-I=)C~C~C~C~C~@1. 
cyclopentyl = C[1]~(-I=)C~C~C~C~@1. Et = C-[!R]C. inden = 
C[1]:C(~C~X-[2]):C(~@2):C:C@1. iBu = C-[!R]C-[!R]C(-[!R]Q-[!R]C. iPe = C- 
[!R]C-[!R]C-[!R]C(-[!R]C)-[!R]C. Me = C. naphth = 
C[1]:C(~C~X~[2]):C(~@2):C:C:C@1. NoH = !(CH). O denotes ring fusion, 
e.g., benzo fuses a 6-membered aromatic ring. Pe = C-[!R]C-[!R]C- 
[!R]C-[!R]C-[!R]C. Pr = C-[!R]C-[!R]C-['R]C. R# = alkyl chain of 
approximate length #. Simple = !(C~[!R]C). sPe = C(-[!R]C)-[!R]C-[!R]C 
[!R]C-[!R]C. Stilbenyl = C=[!R]C-[!R]C[1]:C:C:C:C:C@1. tBu = C(-[!R]C)(- 
[!R]C)-[!R]C. 



